User Manual for the Web View
ProRules are written in the UniRule format , which is used by the UniProt Knowledgebase (UniProtKB)
automated annotation projects to annotate protein records in the UniProtKB format. The
rules can be displayed in a user-friendly Web View which consists of the following four main sections and associated sub-sections.
N.B. Most of ProRules are domain or site rules, they rarely are protein rules.
General rule information
Accession
This field indicates the accession number of the rule. It is in the form of PRUxxxxx for the ProRule database .
Dates
This field is composed of two lines. The first line indicates the rule creation date; the second corresponds to the last rule revision date.
Data Class
The possible values for this field are:
- Domain indicates that the rule is based on a motif (profile or pattern) that detects a domain. The propagated annotation will only concern this domain. Rules of this data class will be referred to hereafter as Domain rules.
- Site indicates that the rule is based on a motif that detects a site. The propagated annotation will only concern this site. Rules of this data class will be referred to hereafter as Site rules .
- Protein indicates that the rule is based on a profile or metamotif that covers the complete protein sequence. In this case the rule enables the complete annotation of a UniProtKB entry. Rules of this data class will be referred to hereafter as Protein rules.
Predictors
The line(s) Predictors indicates the motif identifier(s) used to trigger the application of the rule. The trigger can be:
-
A PROSITE pattern or profile
format:PROSITE; the PROSITE motif accession number; the PROSITE motif entry name
e.g.PROSITE; PS50292; PEROXIDASE_3
-
A PROSITE metamotif
format:Metamotif; -; the metamotif itself
e.g.Metamotif; -; PS50021=7,91=PS50021
Name and function
These fields are mandatory for Domain and Site rules and optional for Protein rules.
They provide the name and the function of respectively the domain, site or protein.
Propagated annotation
Identifier, protein and gene names, description
The name and the content of this section depend on the type of rule.
For Domain and Site rules this is an optional field. It then contains only the part of the description which is common to all rule members preceeded by a plus (+).
For Protein rules it corresponds to:
- An Identifier: the mnemonic code for the protein name
- A Description of the protein
- The common Gene Name of the protein, when it exists
Comments
This section contains all applicable comment lines of a UniProtKB entry (see: the CC line section of the UniProt Knowledgebase User Manual ).
Cross-references
This section can be used:
-
To indicate cross-references to domain and family databases within a UniProtKB entry; e.g. HAMAP, PROSITE, Pfam, PRINTS,
TIGRFAMs and PIRSF (see: the DR line section of the
UniProt Knowledgebase User Manual ).
format:Database Name identifier1; identifier2; number of expected hits;
e.g.Pfam; PF02033; RBFA; 1;
TIGRFAMs; TIGR00082; rbfA; 1;
PROSITE; PS01319; RBFA; 1;
-
To indicate which other rule(s), if any, must be applied to completly annotate the protein or the domain.
Two main cases can be distinguished:
-
Triggering of Domain and/or Site rules:
This concerns rules to annotate a Protein containing domain(s) and/or site(s). It also concerns any rule aiming to annotate a Domain which contains Site(s).
format:PROSITE identifier1; identifier2; number of expected hits; trigger=accession number of the rule to be triggered;
e.g.PROSITE PS50035; PLD; 1; trigger=PRU00153;
-
Triggering of other rule(s) to annotate features such as Transmembrane, coiled coil (...):
This concerns annotation of either a protein or a domain. In this case the format is:
format:feature name; -; number of expected hits; trigger=yes;
e.g.General Transmembrane; -; 6-10; trigger=yes;
-
Triggering of Domain and/or Site rules:
Gene Ontology (GO)
This section contains cross-references to the Gene Ontology database. (see the GO line section of the UniProt Knowledgebase User Manual and the GO project
website ).
Keywords
This section contains all applicable keywords of a UniProtKB entry (see: the KW line section of the UniProt Knowledgebase User Manual ).
Conditions may be used in feature lines. They usually correspond to pattern constraints, or to the presence of a specific amino acid.
e.g.
e.g.
e.g.
e.g.
The count of rule members is indicated for each major kingdom ( i.e. Archaea, Bacteria, Eukaryota and Viruses). By clicking on each count you can access a detailed list of:
For Protein rules only.
This section allows to get proteins with evidence at protein level and/or at transcript level. For more details about the criteria used to define Protein existence please refer to the document pe_criteria.txt
This section allows to get proteins whose three-dimensional structure (for Protein rules) or part of it (for Protein rules and Domain rules) has been resolved experimentally (for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are available in the Protein Data Bank (PDB) database .
Features
This section contains:
-
Template feature line(s)
It defines the template for all the subsequent Feature lines.
format:From: template_name
where template_name is the unique identifier of the motif or metamotif if the trigger is from PROSITE.
e.g.From: PS50234
- Applicable feature lines that may be applied to UniProtKB entries (e.g. ACT_SITE, METAL, see the FT line section of the UniProt Knowledgebase User Manual ).
Conditions may be used in feature lines. They usually correspond to pattern constraints, or to the presence of a specific amino acid.
e.g.
Key From To Description Condition DISULFID 60 80 By similarity C-x*-COptional label can be used to indicate the presence of a feature which is not mandatory in the matched sequences.
e.g.
Key From To Description Condition BINDING (Optional) 153 153 ATP (By similarity) [RQ]Multiple FT lines that should be applied either all together or not at all are grouped within an FTGroup , to force the common presence of all sites.
e.g.
Key From To Description Condition FTGroup ACT_SITE 42 42 Charge relay system (By similarity) H 1 ACT_SITE 91 91 Charge relay system (By similarity) D 1 ACT_SITE 186 186 Charge relay system (By similarity) S 1This group can then be referenced by a case statement in any other annotation section to be propagated.
e.g.
case <FTGroup:1> Protein name + (EC 3.4.21.-) end case
Additional information
-
Size range: For Domain and Site rules, this line contains the size range of the complete domains
annotated in UniProtKB. For Protein rules, the minimal and maximal sizes of proteins matching the rule are listed.
-
Related Rules: Lists identifiers of rules that are known to be similar in sequence, and which
may produce cross-matches. These are particularly useful when two different rules exist for a short and long version of the same
protein (as occurs sometimes in Protein rules). Long proteins will match both profiles; under these circumstances the longer family
supersedes the shorter family.
-
Template(s): For Protein rules only, lists the accession numbers of the entries from which
the rule's annotation was inferred. The template entries are usually characterized. "Template: None" indicates that there are no
characterization papers on any of the proteins that belong to that family. This is the case for UPFs (Uncharacterized Protein Family),
for example.
-
Scope: This section indicates the kingdoms covered by the rule.
-
Fusion: For Protein rules only, indicates if at least one rule member has been found fused to
another protein/domain at its N- or C-terminus. Fusion may be to another protein or to a known/unknown domain.
-
Duplicate: For Protein rules only, lists the 5-letter code of the complete proteomes in
which more than one protein matches the rule.
-
Plasmid encoded: For Protein rules only, indicates the 5-letter code of the organism in which
the protein is encoded on a plasmid.
-
Repeats: For Domain and Site rules only, indicates the expected number (single number, a
range, or unlimited) of repetitions of a domain or site in rule matches.
-
Topology: For Domain/Site rules only, specifies the subcellular location(s) in which a Domain
or Site may occur.
-
Example(s): Optional for Protein rules and mandatory for Domain and Site rules. One or more
example entries targeted by the rule are indicated.
- Comments on the rule: This optional section contains additional useful information including: 5-letter codes of organisms with possible wrong starts, divergent paralogs, proteins that are excluded from alignment due to anomalies, etc.
UniProtKB rule member sequences
UniProtKB sets
The count of rule members is indicated for each major kingdom ( i.e. Archaea, Bacteria, Eukaryota and Viruses). By clicking on each count you can access a detailed list of:
- the UniProtKB match entries (for Protein rules)
- the UniProtKB/Swiss-Prot match entries (for Domain rules)
Retrieve set of characterized or identified proteins for this family
For Protein rules only.
This section allows to get proteins with evidence at protein level and/or at transcript level. For more details about the criteria used to define Protein existence please refer to the document pe_criteria.txt
Retrieve set of proteins with 3D structure
This section allows to get proteins whose three-dimensional structure (for Protein rules) or part of it (for Protein rules and Domain rules) has been resolved experimentally (for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are available in the Protein Data Bank (PDB) database .