MRC Laboratory of Molecular Biology
(University of Cambridge), Structural Studies, Hills Road, Cambridge
CB2 2QH, UK
title: Protein Families and
Regulatory Patterns in the Transcription Factors of E.
coli
M. Madan Babu* & Sarah A.
Teichmann
Keywords: Transcription Factors,
Regulatory Network and Domain Combinations
In all organisms, there are
cascades of transcription factors (TFs) that regulate each other in
order to amplify or diversify the effect of a signal on gene
regulation. This is true even in simpler organisms such as the
prokaryote E. coli, where the most complex known cascade involves
four levels of TFs. To gain insight into the evolution and
organisation of the transcription factor regulatory network in E.
coli, we used the information available in RegulonDB (Salgado et al.,
2001) on TFs and their regulated genes. To find out about the domain
architecture and family membership of E. coli TFs, we used the
SUPERFAMILY (Gough et al., 2001) database of structural assignments
to predicted proteins from genomes, and extracted all those with
DNA-binding domains (DBDs). Thus we were able to determine the domain
combinations of the TFs and study the network from an evolutionary
perspective to identify patterns of related TFs and related genes
controlled by them. In E. coli, there are 267 TFs belonging to
thirteen DBD families of known structure as identified in
SUPERFAMILY. About two thirds of the 267 proteins have the same
domain architecture as at least one other protein, and have thus
arisen as a consequence of duplication of a complete gene. (Bashton
et al., 2002 and Apic et al., 2001 show that proteins with identical
domain architecture are highly likely to be direct duplicates.) On
average, each TF is made up of two domains, and the neighbouring
domains to the DBD are of the small molecule binding type (SMDB) in
95 of the 240 multidomain proteins. When we looked at the position of
the DBD in the sequential domain architecture, there are twelve
domain combinations where it is N-terminal to the SMBD domain, and
six cases where it is C-terminal. Overall, there are 38 domain
combinations where the DBD is at the C-terminus, one where it is in
the middle, and 25 where it is at the N-terminus. Of the identified
267 transcription factors, we have experimental information that
there are 28 activators, 23 repressors and 26 dual regulators. As
previously reported by Perez-Rueda et al., 2000, all the 23
repressors have their DBD in the N-terminus and the activators had
their DBD mostly in the C-terminus but there were 9 examples of the
activators where the DBD was in the N-terminus. For the dual
regulator proteins, there is almost an equal distribution for the DBD
to occur in the N-terminus and the C-terminus. Information on
regulated genes is available for 92 TFs from RegulonDB. We classified
the 92 TFs into seven functional classes according to what they
respond to and what they control: Carbon compound metabolism (food),
Environmental sensors (extracellular environment), Redox status
sensors (respiration), Ion transport, Antibiotic resistance,
Structural proteins and Enhancers. Within each of these classes,
there are one to three proteins that control over 50 genes (global
regulator) and the other TFs control fewer than 20 genes (fine
tuners). 44 of the 92 TFs with known regulated genes have TFs amongst
their regulated genes. Considering these TF regulatory networks,
which are cascades with two to four levels, we see that the effect of
global regulators is further enhanced by TFs that are regulated by
them. The whole network has four major regulatory hubs that represent
global regulators controlling over 50 genes: CRP, FNR, HimA and ArcA.
These also control the most TFs (between 32 and 36). Thus the
effective number of genes regulated increases as a function of the
TFs regulated by the each TF. Taking this into account, the trend of
global regulators and fine tuners is further strengthened, as there
is generally a good correlation between the rank in the direct and
the indirect number of genes regulated. In terms of the number of
levels in the transcription factor regulatory network, there are 26
two-, 8 three-, and 2 four-level cascades of TFs. In addition, over
half of the 92 TFs negatively regulate themselves as previously
reported (Thieffry et al., 1998). There are only four cases of
positive autoregulation (with direct binding site information), and
these are active only in combination with an inducer or co-activator
molecule, so effectively these are regulated by an external switch.
As mentioned above, two thirds of the TFs have identical domain
architecture to another E. coli TF, and are thus the product of gene
duplication. We investigated whether the same is true for their
promoter regions by checking whether duplicated TFs were regulated by
the same or at least homologous TFs. There are only three examples of
TFs that are controlled by TFs with homologous DBDs. Therefore, at
least within the small set of TF genes, each TF has evolved its
control region independently of its duplicates for optimal
regulation.
References:
[1] Apic G, Gough J and
Teichmann SA. 2001. Domain combinations in archaeal, eubacterial and
eukaryotic proteomes. Journal of Molecular Biology
310:311-325.
[2] Bashton M and Chothia
C. 2002. The Geometry of Domain Combination in Proteins. Journal of
Molecular Biology 315:927-939.
[3] Gough J, Karplus K,
Hughey R and Chothia C. 2001. Assignment of homology to genome
sequences using a library of hidden Markov models that represent all
proteins of known structure. Journal of Molecular Biology
313:903-919.
[4] Perez-Rueda E and
Collado-Vides J. 2000. The repertoire of DNA-binding transcriptional
regulators in Escherichia coli K-12. Nucleic Acids Research 28:
1838-1847.
[5] Salgado H,
Santos-Zavaleta A, Gama-Castro S, Millan-Zarate D, Diaz-Peredo E,
Sanchez-Solano F, Perez-Rueda E, Bonavides-Martinez C and
Collado-Vides J. 2001. RegulonDB (version 3.2): transcriptional
regulation and operon organization in Escherichia coli K-12. Nucleic
Acids Research 29: 72-74.
[6] Thieffry D, Huerta AM,
Perez-Rueda E, Collado-Vides J.1998. From specific gene regulation to
genomic networks: a global analysis of transcriptional regulation in
Escherichia coli. BioEssays 20:433-440.
|