PROSITE logo

ProRule logo
User Manual for the Web View

ProRules are written in the UniRule format , which is used by the UniProt Knowledgebase (UniProtKB) automated annotation projects to annotate protein records in the UniProtKB format. The rules can be displayed in a user-friendly Web View which consists of the following four main sections and associated sub-sections.


N.B. Most of ProRules are domain or site rules, they rarely are protein rules.

General rule information

Accession
This field indicates the accession number of the rule. It is in the form of PRUxxxxx for the ProRule database .

Dates
This field is composed of two lines. The first line indicates the rule creation date; the second corresponds to the last rule revision date.

Data Class
The possible values for this field are:
Predictors
The line(s) Predictors indicates the motif identifier(s) used to trigger the application of the rule. The trigger can be:
Name and function
These fields are mandatory for Domain and Site rules and optional for Protein rules.
They provide the name and the function of respectively the domain, site or protein.

Propagated annotation

Identifier, protein and gene names, description
The name and the content of this section depend on the type of rule.

For Domain and Site rules this is an optional field. It then contains only the part of the description which is common to all rule members preceeded by a plus (+).
For Protein rules it corresponds to:
Comments
This section contains all applicable comment lines of a UniProtKB entry (see: the CC line section of the UniProt Knowledgebase User Manual ).

Cross-references
This section can be used:
  1. To indicate cross-references to domain and family databases within a UniProtKB entry; e.g. HAMAP, PROSITE, Pfam, PRINTS, TIGRFAMs and PIRSF (see: the DR line section of the UniProt Knowledgebase User Manual ).

    format:
     Database Name identifier1; identifier2; number of expected hits;

         e.g.
      Pfam; PF02033; RBFA; 1;

            TIGRFAMs; TIGR00082; rbfA; 1;

            PROSITE; PS01319; RBFA; 1;


  2. To indicate which other rule(s), if any, must be applied to completly annotate the protein or the domain.

    Two main cases can be distinguished:
    • Triggering of Domain and/or Site rules:

      This concerns rules to annotate a Protein containing domain(s) and/or site(s). It also concerns any rule aiming to annotate a Domain which contains Site(s).

      format:
       PROSITE identifier1; identifier2; number of expected hits; trigger=accession number of the rule to be triggered;

           e.g.
       PROSITE PS50035; PLD; 1; trigger=PRU00153;


    • Triggering of other rule(s) to annotate features such as Transmembrane, coiled coil (...):
      This concerns annotation of either a protein or a domain. In this case the format is:

      format:
       feature name; -; number of expected hits; trigger=yes;

           e.g.
       General Transmembrane; -; 6-10; trigger=yes;


Gene Ontology (GO) This section contains cross-references to the Gene Ontology database. (see the GO line section of the UniProt Knowledgebase User Manual and the GO project website ).

Keywords This section contains all applicable keywords of a UniProtKB entry (see: the KW line section of the UniProt Knowledgebase User Manual ).

Features
This section contains:
  1. Template feature line(s)
    It defines the template for all the subsequent Feature lines.

    format:
     From: template_name

    where template_name is the unique identifier of the motif or metamotif if the trigger is from PROSITE.
         e.g.
     From: PS50234


  2. Applicable feature lines that may be applied to UniProtKB entries (e.g. ACT_SITE, METAL, see the FT line section of the UniProt Knowledgebase User Manual ).


Conditions may be used in feature lines. They usually correspond to pattern constraints, or to the presence of a specific amino acid.

e.g.

Key             From            To          Description         Condition

DISULFID      60            80          By similarity        C-x*-C

Optional label can be used to indicate the presence of a feature which is not mandatory in the matched sequences.

e.g.

Key                    From             To              Description         Condition

BINDING (Optional)  153            153          ATP (By similarity)        [RQ]

Multiple FT lines that should be applied either all together or not at all are grouped within an FTGroup , to force the common presence of all sites.

e.g.

Key         From    To     Description                          Condition        FTGroup

ACT_SITE      42    42     Charge relay system (By similarity)      H       1

ACT_SITE      91    91     Charge relay system (By similarity)      D       1

ACT_SITE     186   186     Charge relay system (By similarity)      S               1

This group can then be referenced by a case statement in any other annotation section to be propagated.

e.g.

case  <FTGroup:1>

   Protein name     + (EC 3.4.21.-)

end case


Additional information

  • Size range: For Domain and Site rules, this line contains the size range of the complete domains annotated in UniProtKB. For Protein rules, the minimal and maximal sizes of proteins matching the rule are listed.

  • Related Rules: Lists identifiers of rules that are known to be similar in sequence, and which may produce cross-matches. These are particularly useful when two different rules exist for a short and long version of the same protein (as occurs sometimes in Protein rules). Long proteins will match both profiles; under these circumstances the longer family supersedes the shorter family.

  • Template(s): For Protein rules only, lists the accession numbers of the entries from which the rule's annotation was inferred. The template entries are usually characterized. "Template: None" indicates that there are no characterization papers on any of the proteins that belong to that family. This is the case for UPFs (Uncharacterized Protein Family), for example.

  • Scope: This section indicates the kingdoms covered by the rule.

  • Fusion: For Protein rules only, indicates if at least one rule member has been found fused to another protein/domain at its N- or C-terminus. Fusion may be to another protein or to a known/unknown domain.

  • Duplicate: For Protein rules only, lists the 5-letter code of the complete proteomes in which more than one protein matches the rule.

  • Plasmid encoded: For Protein rules only, indicates the 5-letter code of the organism in which the protein is encoded on a plasmid.

  • Repeats: For Domain and Site rules only, indicates the expected number (single number, a range, or unlimited) of repetitions of a domain or site in rule matches.

  • Topology: For Domain/Site rules only, specifies the subcellular location(s) in which a Domain or Site may occur.

  • Example(s): Optional for Protein rules and mandatory for Domain and Site rules. One or more example entries targeted by the rule are indicated.

  • Comments on the rule: This optional section contains additional useful information including: 5-letter codes of organisms with possible wrong starts, divergent paralogs, proteins that are excluded from alignment due to anomalies, etc.

UniProtKB rule member sequences

UniProtKB sets

The count of rule members is indicated for each major kingdom ( i.e. Archaea, Bacteria, Eukaryota and Viruses). By clicking on each count you can access a detailed list of:
  • the UniProtKB match entries (for Protein rules)
  • the UniProtKB/Swiss-Prot match entries (for Domain rules)

Retrieve set of characterized or identified proteins for this family

For Protein rules only.
This section allows to get proteins with evidence at protein level and/or at transcript level. For more details about the criteria used to define Protein existence please refer to the document pe_criteria.txt

Retrieve set of proteins with 3D structure

This section allows to get proteins whose three-dimensional structure (for Protein rules) or part of it (for Protein rules and Domain rules) has been resolved experimentally (for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are available in the Protein Data Bank (PDB) database .