Mining Archaeal Proteomes for Eukaryotic Proteins with Novel Functions:

Proteins from
Archaea
Conserved in
Eucarya

 

 

PACE proteins were first identified using BLAST similarity search in public DataBases during manual annotation of the archaeon Pyrococcus abyssi.
Except for one (PACE 28, which is not present in this organism), all subsequent analysis were performed using the P. abyssi protein sequence.
Gene numbering is according to GenBank.

 

Links to:

Local pages

Public DataBases and Tools

- PACE proteins distribution among various organisms
- PACE genes in Archaeal genomes
- Experimental MultiGenome Browser Tool:

- BLAST and PSI-BLAST at NCBI (USA)
- COG Database at NCBI (USA)
- ProDom INRA/CNRS(Toulouse, France)
- PFAM at Sanger Centre in Hinxton (UK)
- PRINTS at University of Manchester (UK)
- Prosite at ExPASy (Swiss Institute of Bioinformatics)
- ClustalW at EBI (EMBL, UK)


PACE #

(click on each number for PACE info )

Features/Comments (1)

Detectable
Sequence Motives (2)

Detectable 3D Fold in PDB

Gene Environment
(3)

Function Prediction
(4)

1

- Walker-type ATP/GTP binding site, Nuclear localization in Spom

+
+
-
unc
2

- Distantly related to plant dessication resistant proteins

- Only present in plants and Archaea

+
-
-
inf
3*

- Subdomain of a large periplasm transport protein in Ecol (KspE)

- Only present in yeast and Archaea

-
-
-
unc
4

- Homolog of sudD gene product in Anid, a supressor of a gene involved in attachment of chromosome to microtubules (bimD6). serine/threonine kinase (COG)

- Close to TopoI in Aful and TopVI in Pyrococcus,

- SuperPACE

+
+
+
inf
5

- Homolog of human RIP-1 protein that is expressed in lung cancer and binds HIV-1 protein rev2 and to Spom KRR1 protein involved in cell division and sporulation, RNA-binding domain (KH motif)

-Close to PACE 4

+
-
+
inf
6*

- SuperPACE

-
-
+
ope
->7

- SuperPACE

-
-
+
inf (Tr, T)
->8

- Homolog of Scer POP5 protein, subunit of the RNAseP and MRP complex, tRNA processing.

- Closely located to translation/transcription proteins in Pyrococcus (Pelota, argtRS, TFIIB, S10, EF1a) and Mjan (EF1a)

+
-
+
inf (T)
9*

- Homolog of DPH2 protein involved in diphtamide biosynthesis, DPH2L protein candidate tumor suppressor gene product, and DPH2, diphtamide synthase in COG

- SuperPACE

+
-
+
unc
10

- SuperPACE

-
-
-
ope
->11

- Homolog of nucleotidyl transferases

- SuperPACE

+
+
-
unc
->12

- Walker-type ATP/GTPase

- Located between MinD and MCM in Pyrococcus. Possibly related to MinD

+
+
+
inf (R, D)
13*

- Homolog of the prefoldin subunit 5 and of human nuclear protein MM-1 that binds to cMyc oncogen and repress its transcription. Coiled coil protein in COG. Annotated as homolog of myc binding protein in Aful

- Closely located to ribosomal proteins LXA (Pyrococcus, Aful), L31, L39 (Pyrococcus, Mthe, Aful), S19 (Aful, Mthe), L18, (Mthe) translation initiation factor IF6 (Pyrococcus) and signal recognition particle, SRP (Mthe, Aful)

+
-
+
inf (T)
14*

- Homolog of the yeast ElP3 protein: RNA polymerase II associated histone acetyltransferase transferase motives

+
-
-
unc
15

- Hydrolase of the HD family (COG)

+
-
-
unc
->16*

- Walker-type ATP/GTP binding site. Helicase signatures, homolog of human protein that binds to the mu-chain switch region in immunoglobulin loci and to helicase Dna2. Many paralogs in yeast.

- Present in hyperthermophilic bacteria

+
+
-
inf (R)
17

- Homolog of ZPR1 protein in Scer (essential) that interacts with EGF and eEF1a and translocate to the nucleus after treatment with mitogens. Zinc finger protein.

+
-
-
unc
->18

- Homolog of SAM-methyltransferases

+
+
-
inf (m)
19

- Homolog of MRA1 protein in Spom, suppressor of Ras1, essential for cell growth. MRA1/C2F family in COG

+
-
-
unc
20

- Ortholog of Pfur ADP-dependent phosphofructokinase

- Only present in Cele

+
-
-
ope
->21

- Predicted ATPase

- SuperPACE, Present in hyperthermophilic bacteria

-
+
-
unc
22

- Present in hyperthermophilic bacteria

+
-
-
inf
23

- Present in 5' of ribosomal proteins L3 et L4 in all Archaea

+
-
+
inf (T)
24

- Absent in human, present in yeast, fused to another module in Cele

+
-
+
inf
25

- Homolog of met10 gene product of Ncra. Absence of DNA methylation at non permissive temperature in medium with low methionine. Putative SAM-dependent Methyltransferase in COG

- Close to RFC subunits (DNA replication/repair) in Pyrococcus

+
+
+
inf (m)
26

- S1-RNA binding domain, similarity with translation initiation factor IF2a and polyribonucleotide nucleotidyl transferase,

- Located near RNA pol subunit L in Mthe, Afue; Aper

+
+
+
inf
27

- Only present in Atha

-
-
-
unc
->28*

- Closely located to ribosomal proteins L40 in Mjan and Mthe.

-
-
-
unc
29*

- Homolog of TFAR19 human protein. Overexpression induces apoptosis in tumor cells.

- Closely located to ribosomal proteins in Aful, Aper, Mthe, Mjan (S19 and/or L39, L31), eIF6 in Aper

+
-
+
inf (T)
30

- Closely located to ribosomal proteins (S2, L13, L18 in Pyrococcus, S2, S4, S14, S16, L17 in Mthe) and RNA pol subunits (N, D in Pyrococcus, K, N, B in Mthe)

- SuperPACE; Present in hyperthermophilic bacteria

-
-
+
unc
31

-

-
-
-
unc
32*

-Homolog of MCT-1 human oncogen protein: involved in cell cycle regulation. RNA binding domain

- Located near ribosomal protein L37 in Aper and Mthe, close to transferases in Pyrococcus

+
-
+
unc
33
NEW PACE
34
NEW PACE
35
NEW PACE
36
NEW PACE

-> = PACEs already under characterization
* = PACEs of potential medical interest
(1) Salient features and comments about each PACE that can be deduced from sequence similarity, sequence motives and 3D folds indentification, chromosomal environment and gene distribution among species.
Bold red : new annotations proposed
Bold blue : preexisting annotations
SuperPACE are PACEs present in all considered organisms
Species mnemonics: Aful : Archeoglobus fulgidus ; Anid : Aspergillus nidulans ; Atha : Arabidopsis thaliana ; Aper : Aeropyrum pernix ; Cele : Caenorhabditis elegans ; Ecol : Escherichia coli ; Mjan : Methanococcus jannashii ; Mthe : Methanobacterium thermoautotrophicum ; Ncra : Neurospora crassa ; Pfur : Pyrococcus furiosus ; Scer : Saccharomyces cerevisiae ; Spom : Schizosaccharomyce pombe

(2) As can be found in ProDom, PFAM, PROSITE or PRINTS Databases

(3) Gene environment of PACEs is considered informative (+) when either :
- a PACE's chromosomal environment is conserved in at least 2 different taxa among : Pyrococcus ssp. , A. fulgidus, A. pernix, M. jannashii, M. thermoautotrophicum.
- any other noticeable gene is located nearby the considered PACE in several different taxa.

(4) unc = uncharacterized, inf = informational, ope = operational, (T)= translation, (Tr) = transcription, (R) replication and repair, (D)= cell division, (m) = methylase

NEW PACEs: (Simonetta Gribaldo)
Methods:
Each PACE was updated by assigning it to a KOG category via KOGNITOR. Eukaryotic homologues in the KOG were checked for recent functional annotations in PubMed.

Novel PACEs were identified by the following approach:

1) selecting all uncharacterized COGs with a Archaea only phylogenetic profile and checking them against the KOG database via KOGNITOR
2) selecting all uncharacterized COGs with a Archaea+hyperthermophilic Bacteria phylogenetic profile and checking them against the KOG database via KOGNITOR
3) selecting all uncharacterized COG with a Arhcaea+hyperthermophilic Bacteria+a max of 2 other bacterial taxa phylogenetic profile
4) selecting all uncharacterized KOGs with a all eukaryotes phylogenetic profile and checking them against the COG database
5) selecting all uncharacterized KOG with a all eukaryotes minus Encephalitozoon and checking them agaist the COG database
6) selecting all uncharacterized KOG with a all eukaryotes minus Encephalitozoon and S.cerevisiae and checking them agaist the COG database
7) selecting all uncharacterized KOG with a all eukaryotes minus Encephalitozoon and the two yeasts and checking them agaist the COG database

Next round of analysis:

-Verify genomic context with novel archaeal genomes available to suggest possible interaction and/or function
-Verify the distribution of PACEs in eukaryotic genomes not included in the KOG database and in EST databases
-Verify sequence homology for each PACE for the evolutionary approach to functional analysis



 


Last revision on Thursday March 31 2005

Laboratoire de Biologie Moléculaire du Gène chez les Extrêmophiles.