Pattern search in Genomes

 

fuzzpro/fuzznuc emboss prosite-style pattern matches in complete Genomes DataBase
Phylogeny and Taxa trees -based genome selection

Fuzzpro and Fuzznuc use PROSITE style patterns to search protein sequences.
Patterns are specifications of a (typically short) length of sequence to be found. They can specify a search for an exact sequence or they can allow various ambiguities, matches to variable lengths of sequence and repeated subsections of the sequence.

fuzzpro: The standard IUPAC one-letter codes for the amino acids are used. The symbol 'x' is used for a position where any amino acid is accepted. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example: [ALT] stands for Ala or Leu or Thr. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met. Each element in a pattern is separated from its neighbor by a '-'. (Optional in fuzzpro). Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol. A period ends the pattern. (Optional in fuzzpro). For example, [DE](2)HS{P}X(2)PX(2,4)C

fuzznuc: The standard IUPAC one-letter codes for the nucleotides are used. The symbol 'n' is used for a position where any nucleotide is accepted. Ambiguities are indicated by listing the acceptable nucleotides for a given position, between square parentheses '[ ]'. For example: [ACG] stands for A or C or G. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the nucleotides that are not accepted at a given position. For example: {AG} stands for any nucleotides except A and G. Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: N(3) corresponds to N-N-N, N(2,4) corresponds to N-N or N-N-N or N-N-N-N. When a pattern is restricted to either the 5' or 3' end of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol.

DataBase administrator


 Genomapper BLAST  psiBLAST  Mulalbla  Multalin  Genes & Genomes  BLAST (restricted)  Genome Guts  COG Guess  INTERPROScan  CDD search  Pattern search  Sequence Patterns  COG Trees  Genome Syntenizer

Enter pattern here: (raw text)


Before submission you must select at least one Genome DataBase .
Submit your Query Start Over               
Pattern type Fuzzpro or fuzznuc selector
Database type Sequence type on which to perform pattern match on
Fuzzpro/nuc option Number of allowed mismatches
Fuzzpro/nuc option Reverse (if DNA) default : direct
Output option Fuzzpro/nuc output option (available :default [custom summary display], excel, gff, pir,trace,dbmotif, feattable, motif, simple, tagseq)
fuzzpro/nuc advanced options

Complete Genomes DataBase selection

(Javascript must be activated)
Genome list last revision on Wed Nov 27 2013 16:51:11

Full Phylogenetic Domain selection: Select full phylogenetic domains or open genomes window to make custom selection

 ALL       BACTERIA                ARCHAEA                    EUKARYA            THERMOPHILES      
 
Taxonomy Name selection: Type valid taxonomy name - Suggested names appear as you type

         
Matches only full names
(Names for lineages are from NCBI Taxonomy data)
Taxa Tree selection: Browse taxonomy trees . Click on items to open/close nodes or open lineage selections

(Lineage data for tree construction is from NCBI Taxonomy data)

Tree Help
Expand Bacteria tree   Expand Archaea tree Expand Eukarya tree Expand Thermophiles tree























Y.Zivanovic. cnrs/ups 1998-2009