MyGV.gif (3226 bytes)

 


Table of Contents

Introduction
Installation
Usage
Generalized MyGV Input Format
Link External Programs or Web Services
References
License

 


Introduction

MyGV is an application to visualize (potentially genome-scale) gene structure annotation and prediction. The program was designed in particular to display output of our spliced alignment algorithm GeneSeqer, however output of other programs such as GENSCAN or GeneMark.hmm can also be displayed after transformation to a generalized MyGV input format. Gene structure predictions can be compared with existing gene structure annotation in GenBank format.

This program is written in JAVA and should work on any platform supporting JAVA.

Please direct all communications related to this software to:

Volker Brendel
Department of Zoology & Genetics
Iowa State University
Ames, IA 50011-3260
U.S.A.

Phone:      (515) 294-9884; Fax: (515) 294-6755
email:        vbrendel@iastate.edu

 

Back to Top

Installation

If you read this, you will have uncompressed the MyGV package and should see the following files and directories:

README.html This file
MyGV.jar Jar package of MyGV class files and images
MyGVConfig.xml Configuration file for MyGV (XML format)
demo/ Directory containing data for demonstration
scripts/ Directory containing perl scripts for demonstration
doc/ Directory containing help.html and related documentation files
public/ Directory containing html2txt.c and JAXP_1.1, required by MyGV, downloaded from the public domain for the convenience of the user

 

Basic Installation

This program is written in JAVA and to run on your machine you must have the Java2 runtime environment installed. JAVA version "1.2.2" or higher is recommended, which can be downloaded from http://java.sun.com/j2se/index.html.

To make the MyGV interface more flexible, the preference file of MyGV is written in XML format, which users can edit to change the MyGV menu or other settings. This feature requires the JAXP(Java APIs for XML Processing) package, which can be downloaded from http://java.sun.com/xml/download.html (see also ./public).

Detailed instructions on how to install JAVA and JAXP are posted at the download site and in the accompanying documentation.

MyGV.jar must be included either (1) in the user's CLASSPATH variable, or (2) supplied at runtime via the JAVA "-classpath" option:

(1) Let $MyGV represent the MyGV installation directory, then, for example:
MS-WIN users, add the following line to autoexec.bat:
set CLASSPATH = $MyGV\MyGV1.0\MyGV.jar; %CLASSPATH%

Bash shell Unix users,  add the following line to .bash_profile:
export CLASSPATH=$CLASSPATH:$MyGV/MyGV1.0/MyGV.jar
Then, type "java MyGV" to run MyGV.


(2) Type "java -classpath $CLASSPATH:$MyGV/MyGV.jar MyGV" to run MyGV.

 

Advanced Installation (not required)

This optional step demonstrates how to link external gene identification software or web services into MyGV.  Perl and three other components are needed:

(1) html2txt (html to text file converter)
There are many programs available to convert html to text file, for example,  html2txt.c.   Please rename the executable file "html2txt" and be sure to include it in your default path (see also ./public directory).

(2) POST
This is a Perl script that is included in the Perl module libwww. You may choose to install Bundle::LWP which includes all libwww-perl related modules.

(3) GENSCAN
GENSCAN is an ab initio gene structure prediction program developed by Chris Burge in the research group of Samuel Karlin, Department of Mathematics, Stanford University, which can be downloaded from http://genes.mit.edu/GENSCAN.html.

After installation, do not forget to update your PATH environment variable to include the directories where the three components are located, or move the binaries to a directory on your default path.

 

Back to Top

Usage

Usage:

java MyGV [-h|--help] [-t] [-a from] [-b to] [-d seqfile[{seqid}]] [-g gsqfile[{seqid}]] [-i other_file(s)]

where

  -h/--help           :   Show this information.
  -t                  :   terse (for GeneSeqer output, show AGS only).
  -a from             :   Analyze genomic sequence from position 'from'.
  -b to               :   Analyze genomic sequence up to position 'to'.
  -d seqfile[{seqid}] :   Specify the sequence file [GenBank or FASTA format].
                           If the sequence file contains multiple entries,
                           select the first entry [default] or the entry
                           specified by the optional argument seqid
                           (immmediately following seqfile, enclosed in {}).
  -g gsqfile[{seqid}] :   Specify GeneSeqer output file(s).
                           If the GeneSeqer output file contains output from
                           multiple sequence entries, select the first entry
                           [default] or the entry specified by the optional
                           argument seqid (immediately following gsqfile,
                           enclosed in {}).
  -i other_file(s)    :   Specify other input file(s) in Generalized MyGV Input
                           Format.

 

Examples:

java MyGV -d ./demo/U89959 -g ./demo/U89959.gsq

java MyGV -a 50001 -b 75000 -d ./demo/ATFCA5 -g ./demo/ATFCA5.gsq -i ./demo/ATFCA5.gsn ./demo/ATFCA5.glm ./demo/ATFCA5.gm

Here, U89959 and ATFCA5 are GenBank sequence files; U89959.gsq and ATFCA5.gsq are GeneSeqer output files; and the other files are GIIF files converted from GENSCAN, GlimmerM and GeneMark.hmm, respectively.

Please see ./doc/help.html for instructions on how to navigate in MyGV and ./demo/README for other examples.

 

Back to Top

 

Generalized MyGV Input Format (GIIF)

We have defined a generalized MyGV input format to allow display of output from programs other than our own GeneSeqer. Prior to import into MyGV, the output of an external gene prediction program has to be converted to GIIF. The GIIF file is separated into head (top nine lines) and body (separated from the head by an empty line and enclosed by the key words "BEGIN" and "END") as in the following example:

Example of the Generalized MyGV Input Format
File=gsn2.out
SeqID=AC006932
GenBank=N
SeqShift=0
Length=89479
Source=GENSCAN
Label=GSN
SourceID=1
HasIntronScore=Y
HasExonScore=Y

BEGIN
***GSN   1***
  1)Structure:   ( .Begin.  ..End..   .Do.   .Ac.    Score )
    Exon   1:           240       370   0.11   0.00   0.775
    Exon   2:           525       789   0.92   0.38   0.616
    Exon   3:           861       987   0.89   0.91   0.613
    Exon   4:          1038     1162    0.23   0.38   0.607
  2)Information:
    P1.01 Intr +    240    370  131   2  2   11  -21   171 0.775   2.07 BeO PART
    P1.02 Intr +    525    789  265   0  1   92   38   165 0.616  13.49 BeO PART
    P1.03 Intr +    861    987  127   2  1   89   91    70 0.613  11.73 BeO PART
    P1.04 Term +   1038   1162  125  1   2   23   38   112 0.607   2.27 beO OVER
    P1.05 PlyA +   1217   1222    6                               -3.64
    >AC006932|GENSCAN_predicted_peptide_1|215_aa
    ELIEELEVYLLFYDRSGYGASDSNTKRSLESEVEDIAELADQLELSGVAFVAPVVNYRWP
    SLPKKLIKKDYRTGIIKWGLRISKYAPGLLHWWIIQKLFASTSSVLESNPVYFNSHDIEV
    LKRKTGFPMLTKEKLRERNVFDTLRDDFMVCFGQWDFEPADLSISTKSYIHIWHETTFDQ
    LPRNPPRRTSDRTLRWHLRYDSTCTIAQGRTTKAV
***GSN   2***
  ...
  ...
***GSN   9***
  1)Structure:   ( .Begin.  ..End..   .Do.   .Ac.    Score )
    Exon   1:        39634     39844   0.55   0.65   0.807
    Exon   2:        39967     40107   0.60   0.92   0.904
    Exon   3:        40608     40729   0.00   0.48   0.307
    Exon   4:        40817     40960   0.46   0.89   0.973
    Exon   5:        41273     41863   0.85   0.42   0.942
  2)Information:
    P9.00 Prom +  39535  39574   40                              -11.54
    P9.01 Init +  39634  39844  211  0  1    55   65   141 0.807  12.49 BEO EXAC
    P9.02 Intr +  39967  40107  141  2  0    60   92    80 0.904  10.00 BEO EXAC
    P9.03 Intr +  40608  40729  122  1  2   -18   48   108 0.307   0.49 BEO EXAC
    P9.04 Intr +  40817  40960  144  1  0    46   89    87 0.973   9.16 BEO EXAC
    P9.05 Term +  41273  41863  591  1  0    85   42   388 0.942  32.44 BEO EXAC
    P9.06 PlyA +  42563  42568    6                                1.05
    >AC006932|GENSCAN_predicted_peptide_9|402_aa
    MEKVREIVREGIRVGNEDPRRIIHAFKVGLALVLVSSFYYYQPFGPFTDYFGINAMWAVM
    TVVVVFEFSVGATLGKGLNRGVATLVAGGLGIGAHQLARLSGATVEPILLVMLVFVQDFG
    DEYFEAREKGDYKVVEKRKKNLERYKSVLDSKSDEEALANYAEWEPPHGQFRFRHPWKQY
    VAVGALLRQCAYRIDALNSYINSDFQIPVDIKKKLETPLRRMSSESGNSMKEMSISLKQM
    IKSSSSDIHVSNSQAACKSLSTLLKSGILNDVEPLQMISLMTTVSMLIDIVNLTEKISES
    VHELASAARFKNKMRPTVLYEKSDSGSIGRAMPIDSHEDHHVVTVLHDVDNDRSNNVDDS
    RGGSSQDSCHHVAIKIVDDNSNHEKHEDGEIHVHTLSNGHLQ
END
#input file name
#genomic sequence ID
#Is the genomic sequence in GenBank format? [Y/N/?]
#shift of sequence position
#length of genomic sequence
#name of gene structure prediction program
#label to identify the gene structure prediction program
#priority: [1-4]
#Does the output contain scores for splice sites? [Y/N]
#Does the output contain scores for exons? [Y/N]

#begin of GIIF body
  #start of gene structure annotation
    #Part1: Coordinates and scores
          begin and end position of each exon,
          scores for donor and acceptor sites if HasIntronScore=Y,
          exon score if HasExonScore=Y.

    #Part2: Original gene structure prediction output




































#End of GIIF body
 

Generally, output from any gene identification software should be easily convertible into GIIF. As examples, we include three Perl scripts ( cvrtGM.pl, cvrtGLM.pl, and cvrtGSN.pl) for conversion of output from GeneMark.hmm, GlimmerM, and GENSCAN, respectively.

For example,

./scripts/cvrtGSN.pl -o ./demo/gsn.giif ./demo/gsn.out

Note that because of sequence length limitations for some gene identification programs, long genomic sequences may have to be segmented to be analyzed. The conversion scripts provide the option to specify a sequence shift to adjust predicted positions; for example, if the external program was run on a sequence segment starting at position 573001 in a GenBank file, then the conversion script should be called with the argument "-s 573000" to compare the predicted gene structure with the GenBank annotation. 

For this first release of MyGV, our approach with defining GIIF seemed easiest. However, we realize that different standards are being explored, including GFF (General Feature Format) proposed by the Sanger Center or XML, and we anticipate the future releases of MyGV may encompass GFF input or an XML DTD. Feedback welcome!
Back to Top


External Program or Web Service Links

Throught the Generalized MyGV Input Format interface, users can pre-run gene prediction programs locally or obtain results from remote web servers, convert these results into GIIF, and then import the GIIF files into MyGV.  For convenience, MyGV also allows users to directly link external programs or web services.

To implement this function, users have to change the MyGVConfig.xml to add a new GP_MenuItem to the "Gene Prediction" menu in the MyGV display. When users select the menu item "Run_XXX" in MyGV, by default, MyGV will call ./scripts/Run_XXX. The script receives from MyGV three lines via standard input, specifying the starting sequence position for analysis, the organism, and the sequence to be analyzed, respectively. In turn, the script returns to MyGV GIIF-formatted output from the chosen program or service (the script is invoked as a thread running in the background).

This distribution of MyGV includes the scripts ./scripts/Run_GENSCAN and ./scripts/Run_GeneMark.hmm as examples for a locally installed program and a linked web service, respectively. Advanced Installation of MyGV is required for this functionality. Users who wish to install other programs should follow the above examples and edit the following part of MyGVConfig.xml appropriately:

        ...
        <Menu name="Gene Prediction" mnemonic="g">
            <GP_MenuItem name="Run_GENSCAN" mnemonic="s">
                <ref>Local GENSCAN</ref>
                <Length_Limit>-1</Length_Limit>
                <Organism>Vertebrate</Organism>
                <Organism>Arabidopsis</Organism>
                <Organism>Maize</Organism>
            </GP_MenuItem>
            <GP_MenuItem name="Run_GeneMark.hmm" mnemonic="m">
                <ref>http://genemark.biology.gatech.edu/GeneMark/hum.cgi</ref>
                <Length_Limit>100000</Length_Limit>
                <Organism>Human</Organism>
                <Organism>C.elegans(worm)</Organism>
                <Organism>Drosophila</Organism>
                <Organism>Arabidopsis</Organism>
                <Organism>C.reinhardtii(green algae)</Organism>
                <Organism>Chicken</Organism>
                <Organism>Rice</Organism>
            </GP_MenuItem>
        </Menu> 
        ...
Back to Top

References

(1) Zhu, W. and Brendel, V. (2002) Gene structure identification with MyGV using cDNA evidence and protein homologs to improve ab initio predictions. Bioinformatics, in press.

(2) Brendel, V. and Zhu, W. (2001) Computational modeling of gene structure in Arabidopsis thaliana. Plant Molecular Biology, in press.

Back to Top

License

Required. Please see file MyGV.LICENSE.

 

Back to Top

Wei Zhu and Volker Brendel.
Copyright © [Iowa State University]. All rights reserved.
Revised:  December 4, 2001.