![]() |
Introduction |
![]() |
Installation |
![]() |
Usage |
![]() |
Generalized MyGV Input Format |
![]() |
Link External Programs or Web Services |
![]() |
References |
![]() |
License |
MyGV is an application to visualize (potentially genome-scale) gene structure annotation and prediction. The program was designed in particular to display output of our spliced alignment algorithm GeneSeqer, however output of other programs such as GENSCAN or GeneMark.hmm can also be displayed after transformation to a generalized MyGV input format. Gene structure predictions can be compared with existing gene structure annotation in GenBank format.
This program is written in JAVA and should work on any platform supporting JAVA.
Please direct all communications related to this software to:
Volker Brendel
Department of Zoology & Genetics
Iowa State University
Ames, IA 50011-3260
U.S.A.
Phone: (515) 294-9884; Fax: (515) 294-6755
email: vbrendel@iastate.edu
If you read this, you will have uncompressed the MyGV package and should see the following files and directories:
README.html | This file |
MyGV.jar | Jar package of MyGV class files and images |
MyGVConfig.xml | Configuration file for MyGV (XML format) |
demo/ | Directory containing data for demonstration |
scripts/ | Directory containing perl scripts for demonstration |
doc/ | Directory containing help.html and related documentation files |
public/ | Directory containing html2txt.c and JAXP_1.1, required by MyGV, downloaded from the public domain for the convenience of the user |
![]() |
Basic Installation |
This program is written in JAVA and to run on your machine you must have the Java2 runtime environment installed. JAVA version "1.2.2" or higher is recommended, which can be downloaded from http://java.sun.com/j2se/index.html.
To make the MyGV interface more flexible, the preference file of MyGV is written in XML format, which users can edit to change the MyGV menu or other settings. This feature requires the JAXP(Java APIs for XML Processing) package, which can be downloaded from http://java.sun.com/xml/download.html (see also ./public).
Detailed instructions on how to install JAVA and JAXP are posted at the download site and in the accompanying documentation.
MyGV.jar must be included either (1) in the user's CLASSPATH variable, or (2) supplied at runtime via the JAVA "-classpath" option:
(1) Let $MyGV represent the MyGV installation directory, then, for example:MS-WIN users, add the following line to autoexec.bat:set CLASSPATH = $MyGV\MyGV1.0\MyGV.jar; %CLASSPATH%
Bash shell Unix users, add the following line to .bash_profile:Then, type "java MyGV" to run MyGV.export CLASSPATH=$CLASSPATH:$MyGV/MyGV1.0/MyGV.jar
(2) Type "java -classpath $CLASSPATH:$MyGV/MyGV.jar MyGV" to run MyGV.
![]() |
Advanced Installation (not required) |
This optional step demonstrates how to link external gene identification software or web services into MyGV. Perl and three other components are needed:
(1) html2txt (html to text file converter)
There are many programs available to convert html to text file, for example, html2txt.c. Please rename the executable file "html2txt" and be sure to include it in your default path (see also ./public directory).
(2) POST
This is a Perl script that is included in the Perl module libwww. You may choose to install Bundle::LWP which includes all libwww-perl related modules.
(3) GENSCAN
GENSCAN is an ab initio gene structure prediction program developed by Chris Burge in the research group of Samuel Karlin, Department of Mathematics, Stanford University, which can be downloaded from http://genes.mit.edu/GENSCAN.html.
After installation, do not forget to update your PATH environment variable to include the directories where the three components are located, or move the binaries to a directory on your default path.
Back to Top
java MyGV [-h|--help] [-t] [-a from] [-b to] [-d seqfile[{seqid}]]
[-g gsqfile[{seqid}]] [-i other_file(s)]
where
-h/--help : Show this information. -t : terse (for GeneSeqer output, show AGS only). -a from : Analyze genomic sequence from position 'from'. -b to : Analyze genomic sequence up to position 'to'. -d seqfile[{seqid}] : Specify the sequence file [GenBank or FASTA format]. If the sequence file contains multiple entries, select the first entry [default] or the entry specified by the optional argument seqid (immmediately following seqfile, enclosed in {}). -g gsqfile[{seqid}] : Specify GeneSeqer output file(s). If the GeneSeqer output file contains output from multiple sequence entries, select the first entry [default] or the entry specified by the optional argument seqid (immediately following gsqfile, enclosed in {}). -i other_file(s) : Specify other input file(s) in Generalized MyGV Input Format.
Examples:
java MyGV -d ./demo/U89959 -g ./demo/U89959.gsq
java MyGV -a 50001 -b 75000 -d ./demo/ATFCA5 -g ./demo/ATFCA5.gsq -i ./demo/ATFCA5.gsn ./demo/ATFCA5.glm ./demo/ATFCA5.gm
Here, U89959 and ATFCA5 are GenBank sequence files; U89959.gsq and ATFCA5.gsq are GeneSeqer output files; and the other files are GIIF files converted from GENSCAN, GlimmerM and GeneMark.hmm, respectively.
Please see ./doc/help.html for instructions on how to navigate in MyGV and ./demo/README for other examples.
We have defined a generalized MyGV input format to allow display of output from programs other than our own GeneSeqer. Prior to import into MyGV, the output of an external gene prediction program has to be converted to GIIF. The GIIF file is separated into head (top nine lines) and body (separated from the head by an empty line and enclosed by the key words "BEGIN" and "END") as in the following example:
Example of the Generalized MyGV Input Format
File=gsn2.out SeqID=AC006932 GenBank=N SeqShift=0 Length=89479 Source=GENSCAN Label=GSN SourceID=1 HasIntronScore=Y HasExonScore=Y BEGIN ***GSN 1*** 1)Structure: ( .Begin. ..End.. .Do. .Ac. Score ) Exon 1: 240 370 0.11 0.00 0.775 Exon 2: 525 789 0.92 0.38 0.616 Exon 3: 861 987 0.89 0.91 0.613 Exon 4: 1038 1162 0.23 0.38 0.607 2)Information: P1.01 Intr + 240 370 131 2 2 11 -21 171 0.775 2.07 BeO PART P1.02 Intr + 525 789 265 0 1 92 38 165 0.616 13.49 BeO PART P1.03 Intr + 861 987 127 2 1 89 91 70 0.613 11.73 BeO PART P1.04 Term + 1038 1162 125 1 2 23 38 112 0.607 2.27 beO OVER P1.05 PlyA + 1217 1222 6 -3.64 >AC006932|GENSCAN_predicted_peptide_1|215_aa ELIEELEVYLLFYDRSGYGASDSNTKRSLESEVEDIAELADQLELSGVAFVAPVVNYRWP SLPKKLIKKDYRTGIIKWGLRISKYAPGLLHWWIIQKLFASTSSVLESNPVYFNSHDIEV LKRKTGFPMLTKEKLRERNVFDTLRDDFMVCFGQWDFEPADLSISTKSYIHIWHETTFDQ LPRNPPRRTSDRTLRWHLRYDSTCTIAQGRTTKAV ***GSN 2*** ... ... ***GSN 9*** 1)Structure: ( .Begin. ..End.. .Do. .Ac. Score ) Exon 1: 39634 39844 0.55 0.65 0.807 Exon 2: 39967 40107 0.60 0.92 0.904 Exon 3: 40608 40729 0.00 0.48 0.307 Exon 4: 40817 40960 0.46 0.89 0.973 Exon 5: 41273 41863 0.85 0.42 0.942 2)Information: P9.00 Prom + 39535 39574 40 -11.54 P9.01 Init + 39634 39844 211 0 1 55 65 141 0.807 12.49 BEO EXAC P9.02 Intr + 39967 40107 141 2 0 60 92 80 0.904 10.00 BEO EXAC P9.03 Intr + 40608 40729 122 1 2 -18 48 108 0.307 0.49 BEO EXAC P9.04 Intr + 40817 40960 144 1 0 46 89 87 0.973 9.16 BEO EXAC P9.05 Term + 41273 41863 591 1 0 85 42 388 0.942 32.44 BEO EXAC P9.06 PlyA + 42563 42568 6 1.05 >AC006932|GENSCAN_predicted_peptide_9|402_aa MEKVREIVREGIRVGNEDPRRIIHAFKVGLALVLVSSFYYYQPFGPFTDYFGINAMWAVM TVVVVFEFSVGATLGKGLNRGVATLVAGGLGIGAHQLARLSGATVEPILLVMLVFVQDFG DEYFEAREKGDYKVVEKRKKNLERYKSVLDSKSDEEALANYAEWEPPHGQFRFRHPWKQY VAVGALLRQCAYRIDALNSYINSDFQIPVDIKKKLETPLRRMSSESGNSMKEMSISLKQM IKSSSSDIHVSNSQAACKSLSTLLKSGILNDVEPLQMISLMTTVSMLIDIVNLTEKISES VHELASAARFKNKMRPTVLYEKSDSGSIGRAMPIDSHEDHHVVTVLHDVDNDRSNNVDDS RGGSSQDSCHHVAIKIVDDNSNHEKHEDGEIHVHTLSNGHLQ END |
#input file name #genomic sequence ID #Is the genomic sequence in GenBank format? [Y/N/?] #shift of sequence position #length of genomic sequence #name of gene structure prediction program #label to identify the gene structure prediction program #priority: [1-4] #Does the output contain scores for splice sites? [Y/N] #Does the output contain scores for exons? [Y/N] #begin of GIIF body #start of gene structure annotation #Part1: Coordinates and scores begin and end position of each exon, scores for donor and acceptor sites if HasIntronScore=Y, exon score if HasExonScore=Y. #Part2: Original gene structure prediction output #End of GIIF body |
Generally, output from any gene identification software should be easily convertible into GIIF.
As examples, we include three Perl scripts ( cvrtGM.pl,
cvrtGLM.pl, and cvrtGSN.pl)
for conversion of output from GeneMark.hmm, GlimmerM, and GENSCAN, respectively.
For example,
./scripts/cvrtGSN.pl -o ./demo/gsn.giif ./demo/gsn.out
Note that because of sequence length limitations for some gene identification programs, long
genomic sequences may have to be segmented to be analyzed. The conversion scripts provide the
option to specify a sequence shift to adjust predicted positions; for example, if the external
program was run on a sequence segment starting at position 573001 in a GenBank file, then the
conversion script should be called with the argument "-s 573000" to compare the predicted gene
structure with the GenBank annotation.
Throught the Generalized MyGV Input Format interface, users can pre-run gene prediction programs locally or obtain results from remote web servers, convert these results into GIIF, and then import the GIIF files into MyGV. For convenience, MyGV also allows users to directly link external programs or web services.
To implement this function, users have to change the MyGVConfig.xml to add a new GP_MenuItem to the "Gene Prediction" menu in the MyGV display. When users select the menu item "Run_XXX" in MyGV, by default, MyGV will call ./scripts/Run_XXX. The script receives from MyGV three lines via standard input, specifying the starting sequence position for analysis, the organism, and the sequence to be analyzed, respectively. In turn, the script returns to MyGV GIIF-formatted output from the chosen program or service (the script is invoked as a thread running in the background).
This distribution of MyGV includes the scripts
./scripts/Run_GENSCAN
and
./scripts/Run_GeneMark.hmm
as examples for a locally installed program and a linked web service, respectively.
Advanced Installation of MyGV is required for this
functionality.
Users who wish to install other programs should follow the above examples and edit the
following part of MyGVConfig.xml appropriately:
... <Menu name="Gene Prediction" mnemonic="g"> <GP_MenuItem name="Run_GENSCAN" mnemonic="s"> <ref>Local GENSCAN</ref> <Length_Limit>-1</Length_Limit> <Organism>Vertebrate</Organism> <Organism>Arabidopsis</Organism> <Organism>Maize</Organism> </GP_MenuItem> <GP_MenuItem name="Run_GeneMark.hmm" mnemonic="m"> <ref>http://genemark.biology.gatech.edu/GeneMark/hum.cgi</ref> <Length_Limit>100000</Length_Limit> <Organism>Human</Organism> <Organism>C.elegans(worm)</Organism> <Organism>Drosophila</Organism> <Organism>Arabidopsis</Organism> <Organism>C.reinhardtii(green algae)</Organism> <Organism>Chicken</Organism> <Organism>Rice</Organism> </GP_MenuItem> </Menu> ...
(1) Zhu, W. and Brendel, V. (2002) Gene structure identification with MyGV
using cDNA evidence and protein homologs to improve ab initio predictions.
Bioinformatics, in press.
(2) Brendel, V. and Zhu, W. (2001) Computational modeling of gene structure
in Arabidopsis thaliana. Plant Molecular Biology, in press.
Required. Please see file MyGV.LICENSE.