|User Manual for the Web View
ProRules are written in the UniRule format
, which is used by the UniProt
Knowledgebase (UniProtKB) automated annotation projects to annotate protein records in the
format. The rules can be displayed in a
user-friendly Web View which consists of the following four main sections and associated sub-sections.
N.B. Most of ProRules are domain or site rules, they rarely are protein rules.
General rule information
This field indicates the accession number of the rule. It is in the form of
for the ProRule database
This field is composed of two lines. The first line indicates the rule creation date;
the second corresponds to the last rule revision date.
The possible values for this field are:
Domain indicates that the rule is based on a motif (profile or pattern) that detects a
domain. The propagated annotation will only concern this domain. Rules of this data class will be referred to hereafter
Site indicates that the rule is based on a motif that detects a site. The
propagated annotation will only concern this site. Rules of this data class will be referred to hereafter
Protein indicates that the rule is based
on a profile or metamotif that covers the complete protein sequence. In this case the rule
enables the complete annotation of a UniProtKB entry. Rules of this data class will be referred to hereafter as
indicates the motif identifier(s) used to trigger the application of the
rule. The trigger can be::
A PROSITE pattern or profile
PROSITE; the PROSITE motif accession number; the PROSITE motif entry name
PROSITE; PS50292; PEROXIDASE_3
A PROSITE metamotif
Metamotif; -; the metamotif itself
Metamotif; -; PS50021=7,91=PS50021
Name and function
These fields are mandatory for Domain and Site rules and optional for Protein rules.
They provide the name and the function of respectively the domain, site or protein.
Identifier, protein and gene names, description
The name and the content of this section depend on the type of rule.
Domain and Site rules
this is an optional field. It then contains only the part of the description which is common to
all rule members preceeded by a plus (+).
it corresponds to:
Identifier: the mnemonic code for the protein name
Description of the protein
- The common
Gene Name of the protein, when it exists
This section contains all applicable comment lines of a UniProtKB
entry (see: the
CC line section of the UniProt Knowledgebase User Manual
This section can be used:
- To indicate cross-references to domain and family databases within a UniProtKB entry; e.g. HAMAP, PROSITE, Pfam, PRINTS, TIGRFAMs and PIRSF (see: the DR line section of the UniProt Knowledgebase User Manual).
Database Name identifier1; identifier2; number of expected hits;
Pfam; PF02033; RBFA; 1;
TIGRFAMs; TIGR00082; rbfA; 1;
PROSITE; PS01319; RBFA; 1;
- To indicate which other rule(s), if any, must be applied to completly annotate the protein or the domain.
Two main cases can be distinguished:
- Triggering of Domain and/or Site rules:
This concerns rules to annotate a Protein containing domain(s) and/or site(s). It also
concerns any rule aiming to annotate a Domain which contains Site(s).
PROSITE identifier1; identifier2; number of expected hits; trigger=accession number of the rule to be triggered;
PROSITE PS50035; PLD; 1; trigger=PRU00153;
- Triggering of other rule(s) to annotate features such
as Transmembrane, coiled coil (...):
This concerns annotation of either a protein or a domain. In this
case the format is:
feature name; -; number of expected hits; trigger=yes;
General Transmembrane; -; 6-10; trigger=yes;
Gene Ontology (GO)
This section contains cross-references to the Gene Ontology database.
GO line section of the UniProt Knowledgebase User Manual
and the GO project website
This section contains all applicable keywords of a UniProtKB
entry (see: the
KW line section of the UniProt Knowledgebase User Manual
This section contains:
- Template feature line(s)
It defines the template for all the subsequent Feature lines.
where template_name is the unique identifier of the motif or
metamotif if the trigger is from PROSITE.
- Applicable feature lines that may be applied to UniProtKB entries
(e.g. ACT_SITE, METAL, see the
FT line section of the UniProt Knowledgebase User Manual).
may be used in feature lines. They usually correspond to pattern constraints, or to the
presence of a specific amino acid.
Key From To Description Condition
DISULFID 60 80 By similarity C-x*-C
label can be used to indicate the presence of a feature which is not mandatory in
the matched sequences.
Key From To Description Condition
BINDING (Optional) 153 153 ATP (By similarity) [RQ]
Multiple FT lines that should be applied
either all together or not at all are grouped within an
, to force the common presence of all sites.
Key From To Description Condition FTGroup
ACT_SITE 42 42 Charge relay system (By similarity) H 1
ACT_SITE 91 91 Charge relay system (By similarity) D 1
ACT_SITE 186 186 Charge relay system (By similarity) S 1
This group can then be referenced by a
statement in any other annotation section to be propagated.
Protein name + (EC 3.4.21.-)
- Size range: For Domain and Site rules, this line contains
the size range of the complete domains annotated in UniProtKB. For Protein rules, the minimal and maximal sizes of
proteins matching the rule are listed.
- Related Rules: Lists identifiers of rules that are known to be similar
in sequence, and which may produce cross-matches. These are particularly useful when two different
rules exist for a short and long version of the same protein (as occurs sometimes in Protein rules).
Long proteins will match both profiles; under these circumstances the longer family supersedes the
- Template(s): For Protein rules only, lists the accession numbers of the
entries from which the rule's annotation was inferred. The template entries are usually characterized.
"Template: None" indicates that there are no characterization papers on any of the proteins that belong
to that family. This is the case for UPFs (Uncharacterized Protein Family), for example.
- Scope: This section indicates the kingdoms covered by the rule.
- Fusion: For Protein rules only, indicates if at least one rule member
has been found fused to another protein/domain at its N- or C-terminus. Fusion may be to another
protein or to a known/unknown domain.
- Duplicate: For Protein rules only, lists the 5-letter code of the complete proteomes in which more than one
protein matches the rule.
- Plasmid encoded: For Protein rules only, indicates the 5-letter code of the organism in which
the protein is encoded on a plasmid.
- Repeats: For Domain and Site rules only, indicates the
expected number (single number, a range, or unlimited) of repetitions of a domain or site in rule matches.
- Topology: For Domain/Site rules only, specifies the subcellular location(s) in which a Domain or Site may occur.
- Example(s): Optional for Protein rules and mandatory for Domain
and Site rules. One or more example entries targeted by the rule are indicated.
- Comments on the rule: This optional section contains
additional useful information including: 5-letter codes of
organisms with possible wrong starts, divergent paralogs, proteins that are excluded from alignment
due to anomalies, etc.
UniProtKB rule member sequences
The count of rule members is indicated for each major kingdom (i.e.
Bacteria, Eukaryota and Viruses). By clicking on each count you can access
a detailed list of:
- the UniProtKB match entries (for Protein rules)
- the UniProtKB/Swiss-Prot match entries (for Domain rules)
Retrieve set of characterized or identified proteins for this family
For Protein rules only.
This section allows to get proteins with evidence at protein level and/or at
transcript level. For more details about the criteria used to define Protein existence
please refer to the document
Retrieve set of proteins with 3D structure
This section allows to get proteins whose three-dimensional structure (for Protein rules)
or part of it (for Protein rules and Domain rules) has been resolved experimentally
(for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are
available in the Protein Data Bank (PDB) database