International Summer School

   From Genome to Life:

    Structural, Functional and Evolutionary approaches

 


FICHANT Gwennaele

CNRS-IBSM-LCB 31 Chemin Joseph Aiguier, Marseille, 13402 Cedex 20, France

title: Identification, assembly, classification of integrated biological systems in completely sequenced genomes : examples of ABC transporters.

Abstract Systematic genome sequencing projects generate much data whose complexity forces the development of methods that allow knowledge integration and data mining.

The two first levels of genome annotation concern the identification of individual objects-genes or proteins-and the prediction of their putative function through sequence comparisons. However, annotation algorithms do not take into account the numerous relations between objects, including
- physical interactions between proteins involved in the same biological process,

- functional relations that comprise regulatory networks or metabolic pathways, and

-phylogenic relations highlighting functional constraints and permitting reconstruction of evolutionary routes.

We must consider these relations to gain a systemic view of the organism under study. So far, the main public databases dedicated to compiling information about genome annotation focus on the first level: identifying individual objects. Within those databases, researchers have worked to integrate some functional annotations for individual objects that would permit the representation and simulation of functional relations among them, such as metabolic pathways or regulation networks.

We focus here on the declarative representation of the first relation type: interactions between proteins involved in the same biological process, making up an integrated system.

The first annotation step of a complete genome identifies the different system partners, but few methodological developments have described their assembly in a functional system.

Indeed, the classical annotation software does not include the knowledge required for assembling such systems. Identifying the different system partners, their assembly, and the systems? classification in subfamilies requires a coordinated strategy which combines various bioinformatic methods. We will present the nearly automatic strategies we developed for reconstructing biological systems. Identifying the different partners and classifying them in subfamilies lets us integrate the results of various classical sequence analysis approaches such as similarity or profile searches or motif identifications. Their assembly in an integrated system requires other knowledge of such phenomena as close localization on the chromosome of the genes encoding the partners or different partners? membership in compatible evolutionary subfamilies.

The strategies developed must use rules (method layout) and parameters updated with the incoming data (dataflow control). The analysis mechanism stores the data obtained, then reuses them to reevaluate the methods? parameters and thereby launch more accurate analysis of new data sets.

To illustrate an integrated system, we chose the ABC transporters, or traffic ATPases, because they turn up in the three major life kingdoms (Prokaryota, Archea, and Eukaryota) and are involved in many physiological processes. Most ABC transporters mediate the active uptake or efflux of specific molecules across biological membranes, handling a wide variety of compounds that differ in nature and size (oligosaccharides, amino acids, peptides, antibiotics, metallic cations, and so on). They are encoded by large families of paralogous genes and can be arranged in a comprehensive classification well correlated with specificity of transport for the substrate.

The database ABCdb, administer with the ACeDB system, devoted to the ABC transporters that stores the data obtained through the identification strategy and the most recent developments using a knowledge representation system AROM, that appears more suitable than a database for representing complex relationships among data, will be presented.

ADDITIONAL DATA


References of  URLs and papers corresponding to my  talk.
                  
The database on ABC transporter repertories  in completely bacterial
sequenced genomes is available at:
http://ir2lcb.cnrs-mrs.fr/ABCdb
Our paper describing the strategies for the identification and reconstruction of integrated biological systems published in Computers and Chemistry has been selected to be included in the fourth issue (Issue 4, 1 July 2002) of "Proteomics Select - The Virtual Journal of Proteomics" and can be downloaded from the web site :
http://www.proteomicsvj.com
Published papers : Quentin, Y., Fichant, G. and Denizot, F. (1999). Inventory, assembly and analysis of Bacillus subtilis ABC transport systems. J. Mol. Biol. 287, 467-484. Quentin, Y. and Fichant, G. (2000). ABCdb: an ABC Transporter Database. J. Mol. Microbiol. Biotechnol. 2, 501-504. Capponi, C., Chabalier, J., Quentin, Y. and Fichant, G. (2001) A Knowledge Base for Integrated Biological Systems. IEEE Intelligent Systems, 16(6), 52-60. Quentin, Y., Chabalier, J. and Fichant, G. (2002) Strategies for the identification, the assembly and the classification of integrated biological systems in completely sequenced genomes. Computers and Chemistry, 26, 447-457.