International Summer School

   From Genome to Life:

    Structural, Functional and Evolutionary approaches

 


MOHAN Madan - Babu

MRC Laboratory of Molecular Biology (University of Cambridge), Structural Studies, Hills Road, Cambridge CB2 2QH, UK

title: Protein Families and Regulatory Patterns in the Transcription Factors of E. coli

M. Madan Babu* & Sarah A. Teichmann

Keywords: Transcription Factors, Regulatory Network and Domain Combinations

In all organisms, there are cascades of transcription factors (TFs) that regulate each other in order to amplify or diversify the effect of a signal on gene regulation. This is true even in simpler organisms such as the prokaryote E. coli, where the most complex known cascade involves four levels of TFs. To gain insight into the evolution and organisation of the transcription factor regulatory network in E. coli, we used the information available in RegulonDB (Salgado et al., 2001) on TFs and their regulated genes. To find out about the domain architecture and family membership of E. coli TFs, we used the SUPERFAMILY (Gough et al., 2001) database of structural assignments to predicted proteins from genomes, and extracted all those with DNA-binding domains (DBDs). Thus we were able to determine the domain combinations of the TFs and study the network from an evolutionary perspective to identify patterns of related TFs and related genes controlled by them. In E. coli, there are 267 TFs belonging to thirteen DBD families of known structure as identified in SUPERFAMILY. About two thirds of the 267 proteins have the same domain architecture as at least one other protein, and have thus arisen as a consequence of duplication of a complete gene. (Bashton et al., 2002 and Apic et al., 2001 show that proteins with identical domain architecture are highly likely to be direct duplicates.) On average, each TF is made up of two domains, and the neighbouring domains to the DBD are of the small molecule binding type (SMDB) in 95 of the 240 multidomain proteins. When we looked at the position of the DBD in the sequential domain architecture, there are twelve domain combinations where it is N-terminal to the SMBD domain, and six cases where it is C-terminal. Overall, there are 38 domain combinations where the DBD is at the C-terminus, one where it is in the middle, and 25 where it is at the N-terminus. Of the identified 267 transcription factors, we have experimental information that there are 28 activators, 23 repressors and 26 dual regulators. As previously reported by Perez-Rueda et al., 2000, all the 23 repressors have their DBD in the N-terminus and the activators had their DBD mostly in the C-terminus but there were 9 examples of the activators where the DBD was in the N-terminus. For the dual regulator proteins, there is almost an equal distribution for the DBD to occur in the N-terminus and the C-terminus. Information on regulated genes is available for 92 TFs from RegulonDB. We classified the 92 TFs into seven functional classes according to what they respond to and what they control: Carbon compound metabolism (food), Environmental sensors (extracellular environment), Redox status sensors (respiration), Ion transport, Antibiotic resistance, Structural proteins and Enhancers. Within each of these classes, there are one to three proteins that control over 50 genes (global regulator) and the other TFs control fewer than 20 genes (fine tuners). 44 of the 92 TFs with known regulated genes have TFs amongst their regulated genes. Considering these TF regulatory networks, which are cascades with two to four levels, we see that the effect of global regulators is further enhanced by TFs that are regulated by them. The whole network has four major regulatory hubs that represent global regulators controlling over 50 genes: CRP, FNR, HimA and ArcA. These also control the most TFs (between 32 and 36). Thus the effective number of genes regulated increases as a function of the TFs regulated by the each TF. Taking this into account, the trend of global regulators and fine tuners is further strengthened, as there is generally a good correlation between the rank in the direct and the indirect number of genes regulated. In terms of the number of levels in the transcription factor regulatory network, there are 26 two-, 8 three-, and 2 four-level cascades of TFs. In addition, over half of the 92 TFs negatively regulate themselves as previously reported (Thieffry et al., 1998). There are only four cases of positive autoregulation (with direct binding site information), and these are active only in combination with an inducer or co-activator molecule, so effectively these are regulated by an external switch. As mentioned above, two thirds of the TFs have identical domain architecture to another E. coli TF, and are thus the product of gene duplication. We investigated whether the same is true for their promoter regions by checking whether duplicated TFs were regulated by the same or at least homologous TFs. There are only three examples of TFs that are controlled by TFs with homologous DBDs. Therefore, at least within the small set of TF genes, each TF has evolved its control region independently of its duplicates for optimal regulation.

References:

[1] Apic G, Gough J and Teichmann SA. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. Journal of Molecular Biology 310:311-325.

[2] Bashton M and Chothia C. 2002. The Geometry of Domain Combination in Proteins. Journal of Molecular Biology 315:927-939.

[3] Gough J, Karplus K, Hughey R and Chothia C. 2001. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Journal of Molecular Biology 313:903-919.

[4] Perez-Rueda E and Collado-Vides J. 2000. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Research 28: 1838-1847.

[5] Salgado H, Santos-Zavaleta A, Gama-Castro S, Millan-Zarate D, Diaz-Peredo E, Sanchez-Solano F, Perez-Rueda E, Bonavides-Martinez C and Collado-Vides J. 2001. RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Research 29: 72-74.

[6] Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J.1998. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. BioEssays 20:433-440.