CNRS-IBSM-LCB 31 Chemin
Joseph Aiguier, Marseille, 13402 Cedex 20,
France
title:
Identification, assembly, classification of
integrated biological systems in completely
sequenced genomes : examples of ABC
transporters.
Abstract
Systematic genome sequencing projects generate much
data whose complexity forces the development of
methods that allow knowledge integration and data
mining.
The two first
levels of genome annotation concern the
identification of individual objects-genes or
proteins-and the prediction of their putative
function through sequence comparisons. However,
annotation algorithms do not take into account the
numerous relations between objects, including
- physical interactions between proteins involved
in the same biological process,
- functional
relations that comprise regulatory networks or
metabolic pathways, and
-phylogenic
relations highlighting functional constraints and
permitting reconstruction of evolutionary
routes.
We must consider
these relations to gain a systemic view of the
organism under study. So far, the main public
databases dedicated to compiling information about
genome annotation focus on the first level:
identifying individual objects. Within those
databases, researchers have worked to integrate
some functional annotations for individual objects
that would permit the representation and simulation
of functional relations among them, such as
metabolic pathways or regulation
networks.
We focus here on
the declarative representation of the first
relation type: interactions between proteins
involved in the same biological process, making up
an integrated system.
The first
annotation step of a complete genome identifies the
different system partners, but few methodological
developments have described their assembly in a
functional system.
Indeed, the
classical annotation software does not include the
knowledge required for assembling such systems.
Identifying the different system partners, their
assembly, and the systems? classification in
subfamilies requires a coordinated strategy which
combines various bioinformatic methods. We will
present the nearly automatic strategies we
developed for reconstructing biological systems.
Identifying the different partners and classifying
them in subfamilies lets us integrate the results
of various classical sequence analysis approaches
such as similarity or profile searches or motif
identifications. Their assembly in an integrated
system requires other knowledge of such phenomena
as close localization on the chromosome of the
genes encoding the partners or different partners?
membership in compatible evolutionary
subfamilies.
The strategies
developed must use rules (method layout) and
parameters updated with the incoming data (dataflow
control). The analysis mechanism stores the data
obtained, then reuses them to reevaluate the
methods? parameters and thereby launch more
accurate analysis of new data sets.
To illustrate an
integrated system, we chose the ABC transporters,
or traffic ATPases, because they turn up in the
three major life kingdoms (Prokaryota, Archea, and
Eukaryota) and are involved in many physiological
processes. Most ABC transporters mediate the active
uptake or efflux of specific molecules across
biological membranes, handling a wide variety of
compounds that differ in nature and size
(oligosaccharides, amino acids, peptides,
antibiotics, metallic cations, and so on). They are
encoded by large families of paralogous genes and
can be arranged in a comprehensive classification
well correlated with specificity of transport for
the substrate.
The database
ABCdb, administer with the ACeDB system, devoted to
the ABC transporters that stores the data obtained
through the identification strategy and the most
recent developments using a knowledge
representation system AROM, that appears more
suitable than a database for representing complex
relationships among data, will be presented.
|