1) What has been
         done Steps 1-5 where performed
            automatically, then subjected to an individual evaluation
            by the primary annotators (hence "semi-automatic"). No
            alteration of the original (automatic) annotation was
            possible at this stage, only an appreciation on the given
            choice (good/ problem) 2) What is to be
         done
         
          3) How to do the
         work
         
          Auto-annotated
            genes (see
            above
            ) were used to build an Annotation
            Database.
            
             Chomosome
               maps are displaying
               the actual organization of all ORFs on all 6 frames on
               every chromosome. 
               
                   -Jump directly to any
               ORF's genomic environment :
               
                - Access to chromosome maps
               is also possible from any search
               result page.
               
               
   
 
       
   
          
       
      
          
      K. lactis
         annotation guidelines 
      
       
          
      
          
    This page is a shorthand
         "user manual" for using K. lactis annotation
         database
         
         
         
         
         
            
            This gave a set of 30727 ORFs (excluding mitochodrial
            DNA), which is the starting material for this work.
            
            Basically, 5 criteria were used to perform this
            auto-annotation:
            
            
               
               
            Several steps of automatic annotation parameters
            adjustments were used, and ultimately some manual
            corrections were performed.
            
            However , it is very important to take good notice of
            several important facts:
            - First, the annotation was not performed on the
            completely finished genome, so a few regions were
            overlooked at the time of this work (mostly sub-telomeric
            regions, which were missing ).
            - Second, very limited attention has be given to specific
            features such as introns, real versus potential start
            codons, etc.
            - Third and most important, no functional
            annotation has been attempted at this
            level
            
            
            
            Most focus is needed on functional
            annotation, which is currently missing, and
            which should be a cornerstone for the genome analysis.
            Several specific tools have been designed to aid
            annotator in this regard (see below), as this should
            constitute the most important part of the work.
            Several aspects of annotation have not yet been fully
            incorporated in the database, including information about
            RNAs, introns, promoters, non transcribed elements in
            general. This should be gradually done in the near
            future, maybe after I get a little more feedback on these
            features and the best way to deal with them.
            Annotators are expected to give an appreciation on each
            gene's current annotation (if it exists) :
            -in case of problem they should submit their own
            annotation after a careful evaluation of all data
            pertaining to a particular gene (see below), otherwise
            they can leave the current one untouched.
            -in case an ORF has been wrongly attributed the status of
            a gene (there are cases), it can be deleted.
            -in case an ORF is obviously a gene and has been
            overlooked in the automatic annotation stage, if can be
            added, and annotator must provide an annotation.
            -Any gene that can be identified with good confidence (
            i.e the annotation says it is probably an
            already known gene because the similarity is good) should
            be classified in a function category according to 2
            different schemes:
            
            
            
               
               This classification is specifically designed for
               YEAST. It's main advantage is that it based on the
               considerable knowledge of yeast genetics and biology.
               It's main disadvantage is that is too specific for
               people working outside the fungi
               field.
               This a functional classification of genes based on
               orthology groups in completely sequenced genomes
               within large domains : procaryotes (archaea+ bacteria)
               or eucaryotes. The main advantages are that it's
               wildly accepted among various biologists communities,
               and a big effort is made on predicting the function of
               so far hypothetical and/or poorly characterized genes.
               The main disadvantage is that it does not deal
               specifically with the needs of fungi
               biologists. Whatever
            the respective pros and cons of each system,
            annotators
            are requested to attribute a functional annotation in
            BOTH classification schemes.
            Although the various functional categories in each system
            are different , and sometimes don't overlap quite
            obviously, functional attribution in each system should
            not be contradictory with the other one as much as
            possible.
Whatever
            the respective pros and cons of each system,
            annotators
            are requested to attribute a functional annotation in
            BOTH classification schemes.
            Although the various functional categories in each system
            are different , and sometimes don't overlap quite
            obviously, functional attribution in each system should
            not be contradictory with the other one as much as
            possible. 
            
               
               b) submit your annotation after careful evaluation
               
               
            
            
            Every
            ORF (all 30727 of them)
            from K. lactis genomic sequence has been matched against
            several public databases (trEMBL, SWISSProt, YEAST,C.
            albicans,Pfam, KOG, self match) , and all results have
            been compiled in a database called "Matches in
            Public databases"
            
            
               
               This is a standard Query Form which allows to search
               with multiple criteria . It's use is rather
               straightforward for anyone how has already tried to
               retrieve valuable information from internet
               databases.
               Results are displayed in such a way as to give access
               to the maximal number of information for each ORF
               (gene maps and sequences, blast and other tools
               results, direct links to public resources
               information).
               This database allows one to work in "transversal"
               manner , as opposed to working on chromosome maps (see
               below)
               which strictly dependant on the spacial arrangement of
               genes on the genome .
               
               
               This
               database will contain all newly added annotations, as
               well as those already defined in the automatic
               annotation stage.
               Its content is
               dynamically updated as soon as you add / delete or
               modify existing annotation (see below)
               The database is designed to allow multiple criteria
               requests on annotated data to ease genome analysis
               once the annotation is finished.
               This information is also important to give a
               accurate feedback of the ongoing work by other
               collaborators on other chromosomes , for
               example.
               they
               are the main workbench for the annotation
               process.
               Understanding the main parts of these maps design is
               essential for a good and efficient usage.
               Here follows a brief outline, but every user should
               exercise a little bit before actually "diving" into
               annotation.
               You have several ways to access Chromosome maps
               
               - Open a map by pushing an 'K. lactis : Map &
               Annotation' button found on the annotation
               entry page .
               
               
                  
                  
                  
                  
               
               
               
                  
                  This is the actual 6 frames ORFs map, it starts a
                  position 1 on the given chromosome sequence. It's a
                  dynamical map, very each rectangle displays the
                  location of an ORF on the sequence (on direct
                  strand above the blue line, reverse strand below
                  blue line). Boxed rectangles represent ORFs that
                  have been annotated , thus they represent real
                  "GENES". Others remain as shadows, and will never
                  become genes unless someone adds an annotation for
                  them.
                  Passing over an ORF on the map with your pointer
                  triggers
                  the selection
                  of this
                  particular element. The
                  selected element , which
                  appears in the field labeled "Element Info" can now
                  be subject to any action performed in the CONTROLS
                  part of the window.
                  This is a "control
                  panel" where
                  any action really takes place
                  Please note that you can type (or paste) any valid
                  ORF name in the "Element Info" field to have it
                  selected. This works only for ORFs belonging to the
                  chromosome whose map is being currently
                  displayed.
                  SetUp: the yellow part is the
                  "SepUp" of the map :
                  you can move left or right by an offset indicated
                  in the "Offset' field, zoom in or out, change size
                  of the map (for small screen users), select a
                  specific range to display by entering start / end
                  coordinates in the corresponding fields and change
                  the coloring scheme of the map. This feature is
                  explained briefly in the first
                  RESULT page
                  when a map is started
                  Get Info:the blue part is
                  where you ask specific information for the
                  selected element .
                  Pushing any of the reddish button (*) will call the
                  results for the selected
                  element . In this example,
                  "r_klactIV0097" is the selected element, you have
                  just pushed the "Hits in sprot+trEMBL" button, and
                  the blast result is displayed in the RESULT part
                  (see below).
                  (*): Hits in SWISSPROT+trEMBL; Hits in SWISSPROT;
                  Hits in KOG; Hits in HmmPfam; Paralogs; Hits in
                  Yeast; Hits in C. albicans; CYGD:MIPS
                  function_category. Also, you can directly Blast the
                  selected element sequence vs NCBI Blast database
                  (Run-psi-Blast).Other function are explained
                  below.
                  Decide: the green  part is the
                  annotation
                  submission form.
                  Once you have decided to add / delete or modify an
                  existing annotation (see below
                  for an guidelines to this issue), you must use this
                  form to enter the relevant information in the
                  different fields. See below for more
                  details.
                  This is were all results are displayed. Take note
                  that you can have results displayed in a separate
                  window instead, by selecting "Display results in
                  New Window" , a toggle button that is located in
                  the CONTROLS part of the window.
                  
                  
                  
                  
            
            First, a few rules to keep in mind while
            annotating:
            
            - K.
            lactis is very close
            (phylogenetically speaking at least)
            to S.
            cerevisiae. The
            semi-automatic annotation is very heavily bearing on K.
            lactis /S. cerevisiae gene -to-gene comparison . This
            sets a very strong frame for K. lactis annotation, as S.
            cerevisiae is itself very thoroughly annotated.
            This means that it is not necessary to duplicate the
            automatic annotation when genes from both strains are
            very close one to each other. In this (most frequent)
            case, you need only to add a functional annotation to the
            existing one : see below.
            - Automatic
            annotation is very
            convenient, but it is
            in no way smart ! .
            We have certainly overlooked some obvious errors while
            checking the complete annotation from this stage , so
            these errors have to be fixed. More complex situations (
            multidomain proteins, introns, no clear-cut similarity
            evidence, etc ) are more probably (very) poorly annotated
            at present time. This part can be estimated to about 20%
            of all K. lactis genes, and this is where a major effort
            should be concentrated. Here again, functional annotation
            should be attempted.
            - A
            common source of error
            for beginner annotators is
            to believe that genes can overlap
            ( either on the same strand or on opposite strands). This
            kind of configuration is extremely
            unlikely, especially for overlaps on opposite strands.
            Here is an
            example of two "genes"
            overlapping: this
            was wrongly generated by automatic
            annotation. What can sometimes be seen on chromosome maps
            is that 2 good gene candidates (actually ORFs in fact)
            overlap by a small amount. This is mostly because the
            second "gene" does not begin at the real start codon.
            Remember that ORFs are defined as the longest possible
            open reading frame. Another, more frequent situation,
            occurs when a K. lactis gene is obviously longer than all
            other orthologs. Selecting the right start codon is a
            significant part of the job. A specific
            tool for finding alternate start
            codons has been
            designed for this purpose .
            
            You are now ready to start annotating:
            
            Start working on the chromosome that you have
            chosen. Open the chromosome map ( which starts at
            position 1 ) and start travelling along the sequence.
            Select a coloring
            scheme (see above)
            that fits you needs . The maps are opened by default with
            a "SWISSPROT+trEMBL" hits coloring coloring scheme which
            provides the most complete similarity database.
            Boxed ORFs are genes already annotated from the automatic
            annotation stage . Push the "New annotations" button to
            see all annotations pertaining to any particular
            element.
            Select an element and examine results from hits in
            sequence databases by pushing the appropriate button (see
            CONTROLS above). You can also retrieve the ORF's sequence
            and use it with your own tools: just click
            on the ORF in the map.
            Check the coding probability of the selected ORF by using
            the "HMM prediction" coloring scheme. The different
            coloring schemes will help you focus on meaningful ORFs (
            those with a significant hit in any particular
            database).
            Don't
            waste your time on
            white
            or blue, box-less, small ORFs
            ! They are probably only "shadows".
            Don't
            waste your time on
            small
            box-less colored
            (red , blue or green ) ORFs overlapping completely with a
            longer gene (boxed ORF)They are probably only
            "shadows"
            Pay attention to differences between gene lengths in
            Blast results: they are often a useful guide to figure
            out the correct start codon.
            
            Start defining the functional
            annotation once you have reached a good
            confidence in what the blast alignments say.
            There are 3
            main tools to help
            you define the functional annotation:
            
            - Pfam
            : Protein families database of alignments and HMM. This
            is most useful to find common protein domains and
            families
            
            - KOG
            : This is actually a highly structured and categorized
            sequence database. A good hit with one of the many
            "clusters" of sequences gives the functional
            classification of the query sequence. The
            interpretation
            is not completely
            straightforward,
            depending on the quality of alignment, so a
            "Guess"
            is made about the functional pathway , category and KOG
            number confidence level (this is based on a few simple
            rules : see this
            page). Be aware
            however that true fitting of proteins in the COG database
            are made by COGNITOR program and E. Koonin group members
            at NCBI.
            
            -MIPS-CYGD
            : This
            tool gives the functional
            classification of
            the best S. cerevisae (YEAST) hit with the
            selected element . It is extremely important
            to understand that if the
            best YEAST hit is very weak or the alignment is
            problematic, then this
            information is not relevant at all. In order to adopt a
            YEAST's gene functional annotation for a K lactis gene,
            the similarity between both genes must be
            good (this is where human decision takes
            place)
            MIPS classification of yeast genes is very detailed, and
            spans several levels ( functions, pathways, cellular
            localization, phenotypes, EC number etc.).For this work,
            we will
            limit K. lactis functional annotation to a single level
            of functional annotation.
            
            Submit your
            annotation:
            Your analysis is now done, you want to submit
            your annotation for a given ORF or gene (an already
            annotated ORF).
            The
            annotation submission form
            is in the green part of the CONTROLS part.
            This rather simple form requires
            only 2 items to be
            filled to allow submission: the "Element Info" field must
            contain a valid element name and the "Description" field
            must contain some text ( this is the definition that
            comes in the first line of all matches in a blast result
            file, as well as the definition line in a fasta file).
            All other fields are optional and left to your personal
            appreciation and wisdom. If you do wish to add functional
            annotation , you must do it by selecting the relevant
            categories in the "KOG"
            and "MIPS" pop-up menus
            . As you select a category from the "KOG" or "MIPS"
            categories pop-up menus, the content of the "KOG number"
            and "MIPS sub-categories" are respectively updated to
            reflect the main menu choice : see
            an example here.
            
            Once you have filled all necessary fields in the
            submission form, initiate
            the submission by pressing the "Submit"
            button. You have now
            the last opportunity to check carefully your annotation
            before final confirmation ( "Confirm" button)
            As soon as your submission has been acknowledged, the
            content of the Annotation Database
            is updated as well as the chromosome map itself (but you
            must press the "Apply Change" button in the CONTROLS
            part. Just pressing "reload" on your browser window won't
            work).
            
            You are, of course, invited to carefully review your
            submission before final confirmation. But in (very
            unlikely ) case you made a mistake, you can always delete
            an existing annotation by selecting "Delete" instead of
            the default "Add" annotation behavior. You will be
            extra-warned
            when attempting to delete
            an existing annotation. Nevertheless, all annotation
            additions or deletions are recorded in the
            Annotation Database, so no
            information is lost. New actions performed on any element
            are simply stacked on top of previous information, so
            everyone can trace back the annotation history of any
            given element.
            
            4) Getting
            started
            Every group of annotators should work on a specific
            chromosome, as decided in agreement with Monique.
            
            - Go to annotation
            entry page which
            contains links to all chromosome maps. Click on the
            chromosome you are intending to work on.
            
            - The map opens up at the beginning of the chromosome,
            and ORFs are colored according to strength of their match
            in SWISSPROT+trEMBL database. Notice that most
            Red
            and Green
            ORFs are already boxed with a black line: these are
            auto-annotated genes.
            First focus your attention on this
            category, then, when you'll be more familiar with the
            database usage, you will be able to address more complex
            problems.
            
            - Select the first boxed ORF in the
            map (pass the mouse pointer over it). Be careful when
            moving your pointer over the map : if you pass over an
            other ORF than the first one you will change the selected
            ORF. Check that the "Element info" in the
            'CONTROLS'
            part of the window contains the correct ORF
            name.
            
            - Gather any information you need
            about the selected element (ORF): push any of the
            'Reddish' buttons found in the blue part of 'CONTROLS'.
            Any annotation that exists for the selected ORF can be
            found by pushing the "New Annotation" button. As stated
            above, auto-annotation has been performed in view of
            close comparison between K.lactis and S. cerevisiae
            (YEAST). If you find that the current "DESCRIPTION" is
            not satisfactory, you may have to change it.
            
            - Make up your mind with all
            available results :all the database searches can be
            displayed using the "Reddih" buttons. If you want a
            really up-to-date BLAST result, push "Run PSI-Blast" ,
            you will be conducted to the NCBI psi-blast page, with
            the query field already filled with the selected ORF's
            sequence. If you are not very familiar with the
            interpretation of blast and/or patterns results try
            improving you confidence by looking at many other genes,
            or by asking a more expert colleague, or searching the
            internet information related to annotation skills.
            Unfortunatly, it is not possible here to write a course
            on "the art of annotation", but every biologist today
            must have performed at least from time to time a BLAST
            search in public databases.
            As a rule of thumb, if the best
            hit for a blast search
            has an expect value GREATER
            than 1e-10,
            then the similarity should be considered as
            POOR,
            i.e. the gene should be annotated in most cases as
            "Hypothetical protein". But even in this case check the
            Pfam domains results ("Hits in HmmPfam"). If there is
            some significant match with a particular domain, you may
            end up with an "Hypothetical protein, containing xxx
            domain". You might as well choose to put "Hypothetical
            protein" in "DESCRIPTION" and put "Containing xxx domain"
            in the "COMMENT" field. 
            Pay attention to lengths and
            alignment extend of matching sequences. If both sequences
            are about the same size and aligned over their entire
            length, then everything is fine. If one sequence is
            shorter than the other , or they don't align on the same
            part (head to tail, just in the middle or a portion of
            the sequence, one sequence is aligned with 2 different
            ones at different location etc...etc ), then you must try
            to interpret the situation, and knowing biology
            greatly helps in this case. Many different
            explanations are possible of course, including error in
            the sequence! The result of this analysis will be
            YOUR NEW ANNOTATION.
            
            - Determine whether functional annotation can
            be done for the selected ORF.
            No functional annotation is obviously possible for
            "orphan" genes, that is if the gene is at best described
            as "Hypothetical protein", or "weakly similar to some
            other gene". In any case, carefully review results from
            KOG and Pfam database searches ("Hits in KOG" and "Hits
            in HmmPfam"). If the similarity with YEAST is
            high enough, use the "CYGD:MIPS fun_cat"
            button, as explained above. This will give you the
            functional information for the YEAST gene, which in turn
            can be "captured" and used for annotating K. lactis.
            
            - Submit your annotation:
            
             If the existing annotation seems to be ok, and
            there is no possible functional annotation, then just go
            to the next ORF ( no need to change anything)
             If the existing annotation seems to be ok,
            AND you can propose a functional
            annotation, then copy the current annotation
            in the "DESCRIPTION" field in the green part
            of "CONTROLS", otherwise type your new
            ANNOTATION in this field, if it is different
            from the original one. Select the correct function
            category in BOTH
            KOG and MIPS systems,
            if it's possible (see The
            annotation submission form
            above). If the gene is an enzyme, and you have found the
            right EC number, then type it in the "EC :" field in
            'CONTROLS'. Any other relevant information resulting from
            your analysis can be incorporated in
            either one of preexisting fields ("Start Condon", "Gene
            Name", "EC :", etc) or, if does not fit in any of the
            preceding labels, in the "COMMENT" field, in which you
            can type anything you want ( a
            typical thing that can be found in the "COMMENTS' field
            could be :  This gene is split in two separate
            sub-units in all other Ascomycota  or else :
            "Probable sequence error , 5' end of the gene is
            missing").
             If you believe an ORF should not be considered as
            a gene, select "Delete" instead of "Add" in the
            "Annotation" small popup menu (in 'CONTROLS'). You don't
            have to type any further information in that case.
             Press "SUBMIT" (in 'CONTROLS') and review
            carefully
            the information you've just typed in. If something is not
            correct, just correct the wrong item in the 'CONTROLS'
            part and do "SUBMIT" again.
            
             If everything is correct, press "Confirm", you're
            done with this ORF. Your annotation has been incorporated
            in the genome annotation database. You can check this by
            pressing the "New Annotations" button, or by querying the
            selected gene on the annotation
            database search page
            (see above)
            
            - Select the next boxed ORF on the map and start over
            
            - When you have reached the end of the first 100kb of
            your chromosome, move to the next 100
            kb part of the chromosome, press the "Right"
            button in the yellow part of 'CONTROLS'. The map will
            slide by the amount contained in the "Move offset:" field
            in the 3' direction (if you've pushed "Right" button) or
            5' direction ("Left" button).
            -Start over on this region as above.
            
            
            
            
            
            
            Question: What about introns ?
            Introns are currently analyzed in a separate way by the
            Lyon and Orsay groups, and will be displayed in a near
            future in the maps and the annotation database. If you
            think you have identified one or more introns during a
            gene analysis, just put the number of introns in the
            field labeled "Intron", and put all other information in
            the "COMMENT" field.
            
            Question: What about RNAs and other
            non-coding elements?
            As above, all non coding RNA species are treated
            separately by specialized programs and/or expert
            colleagues in the field , and will be incorporated later
            in the database. This topic is not at the top priority at
            the moment, and you should not spend your time on this
            problem (in this annotation framework). If you do wish to
            contribute to RNA or other genome features, please
            contact us directly, we will be pleased to benefit from
            your expertise.