To improve security and privacy, we are moving our web pages and services from HTTP to HTTPS.
To give users of web services time to transition to HTTPS, we will support separate HTTP and HTTPS services until the end of 2017.
From January 2018 most HTTP traffic will be automatically redirected to HTTPS. [more...]
View this page in https
User Manual for the Web View

ProRules are written in the UniRule format, which is used by the UniProt Knowledgebase (UniProtKB) automated annotation projects to annotate protein records in the UniProtKB format. The rules can be displayed in a user-friendly Web View which consists of the following four main sections and associated sub-sections.




N.B. Most of ProRules are domain or site rules, they rarely are protein rules.

General rule information

Accession
This field indicates the accession number of the rule. It is in the form of
PRUxxxxx
for the ProRule database.

Dates
This field is composed of two lines. The first line indicates the rule creation date; the second corresponds to the last rule revision date.

Data Class
The possible values for this field are:
  • Domain
    indicates that the rule is based on a motif (profile or pattern) that detects a domain. The propagated annotation will only concern this domain. Rules of this data class will be referred to hereafter as
    Domain rules
    .
  • Site
    indicates that the rule is based on a motif that detects a site. The propagated annotation will only concern this site. Rules of this data class will be referred to hereafter as
    Site rules
    .
  • Protein
    indicates that the rule is based on a profile or metamotif that covers the complete protein sequence. In this case the rule enables the complete annotation of a UniProtKB entry. Rules of this data class will be referred to hereafter as
    Protein rules
    .

Predictors
The line(s)
Predictors
indicates the motif identifier(s) used to trigger the application of the rule. The trigger can be::
  • A PROSITE pattern or profile

    format:
     PROSITE; the PROSITE motif accession number; the PROSITE motif entry name

         e.g.
     PROSITE; PS50292; PEROXIDASE_3


  • A PROSITE metamotif

    format:
     Metamotif; -; the metamotif itself

         e.g.
     Metamotif; -; PS50021=7,91=PS50021

Name and function
These fields are mandatory for Domain and Site rules and optional for Protein rules.
They provide the name and the function of respectively the domain, site or protein.

Propagated annotation

Identifier, protein and gene names, description
The name and the content of this section depend on the type of rule.

For
Domain and Site rules
this is an optional field. It then contains only the part of the description which is common to all rule members preceeded by a plus (+).
For
Protein rules
it corresponds to:
  • An
    Identifier
    : the mnemonic code for the protein name
  • A
    Description
    of the protein
  • The common
    Gene Name
    of the protein, when it exists

Comments
This section contains all applicable comment lines of a UniProtKB entry (see: the CC line section of the UniProt Knowledgebase User Manual).

Cross-references
This section can be used:
  1. To indicate cross-references to domain and family databases within a UniProtKB entry; e.g. HAMAP, PROSITE, Pfam, PRINTS, TIGRFAMs and PIRSF (see: the DR line section of the UniProt Knowledgebase User Manual).

    format:
     Database Name identifier1; identifier2; number of expected hits;

         e.g.
      Pfam; PF02033; RBFA; 1;

            TIGRFAMs; TIGR00082; rbfA; 1;

            PROSITE; PS01319; RBFA; 1;


  2. To indicate which other rule(s), if any, must be applied to completly annotate the protein or the domain.

    Two main cases can be distinguished:
    • Triggering of Domain and/or Site rules:

      This concerns rules to annotate a Protein containing domain(s) and/or site(s). It also concerns any rule aiming to annotate a Domain which contains Site(s).

      format:
       PROSITE identifier1; identifier2; number of expected hits; trigger=accession number of the rule to be triggered;

           e.g.
       PROSITE PS50035; PLD; 1; trigger=PRU00153;


    • Triggering of other rule(s) to annotate features such as Transmembrane, coiled coil (...):
      This concerns annotation of either a protein or a domain. In this case the format is:

      format:
       feature name; -; number of expected hits; trigger=yes;

           e.g.
       General Transmembrane; -; 6-10; trigger=yes;


Gene Ontology (GO)
This section contains cross-references to the Gene Ontology database. (see the GO line section of the UniProt Knowledgebase User Manual and the GO project website).

Keywords
This section contains all applicable keywords of a UniProtKB entry (see: the KW line section of the UniProt Knowledgebase User Manual).

Features
This section contains:
  1. Template feature line(s)
    It defines the template for all the subsequent Feature lines.

    format:
     From: template_name

    where template_name is the unique identifier of the motif or metamotif if the trigger is from PROSITE.
         e.g.
     From: PS50234


  2. Applicable feature lines that may be applied to UniProtKB entries (e.g. ACT_SITE, METAL, see the FT line section of the UniProt Knowledgebase User Manual).


Conditions
may be used in feature lines. They usually correspond to pattern constraints, or to the presence of a specific amino acid.

e.g.

Key 	    	From 	    	To 	      	Description 	  	Condition

DISULFID 	  60 	    	80 	      	By similarity 	  	 C-x*-C

Optional
label can be used to indicate the presence of a feature which is not mandatory in the matched sequences.

e.g.

Key  	     	       From  	     	To  	       	Description  	   	Condition

BINDING (Optional) 	153 	       153 	      	ATP (By similarity) 	   [RQ]

Multiple FT lines that should be applied either all together or not at all are grouped within an
FTGroup
, to force the common presence of all sites.

e.g.

Key 	    From    To 	   Description 	  	                    Condition 	     FTGroup

ACT_SITE      42    42 	   Charge relay system (By similarity) 	  	H 	  	1

ACT_SITE      91    91 	   Charge relay system (By similarity) 	  	D 	  	1

ACT_SITE     186   186 	   Charge relay system (By similarity) 	  	S               1

This group can then be referenced by a
case
statement in any other annotation section to be propagated.

e.g.

case  <FTGroup:1>

   Protein name 	+ (EC 3.4.21.-)

end case


Additional information

  • Size range: For Domain and Site rules, this line contains the size range of the complete domains annotated in UniProtKB. For Protein rules, the minimal and maximal sizes of proteins matching the rule are listed.

  • Related Rules: Lists identifiers of rules that are known to be similar in sequence, and which may produce cross-matches. These are particularly useful when two different rules exist for a short and long version of the same protein (as occurs sometimes in Protein rules). Long proteins will match both profiles; under these circumstances the longer family supersedes the shorter family.

  • Template(s): For Protein rules only, lists the accession numbers of the entries from which the rule's annotation was inferred. The template entries are usually characterized. "Template: None" indicates that there are no characterization papers on any of the proteins that belong to that family. This is the case for UPFs (Uncharacterized Protein Family), for example.

  • Scope: This section indicates the kingdoms covered by the rule.

  • Fusion: For Protein rules only, indicates if at least one rule member has been found fused to another protein/domain at its N- or C-terminus. Fusion may be to another protein or to a known/unknown domain.

  • Duplicate: For Protein rules only, lists the 5-letter code of the complete proteomes in which more than one protein matches the rule.

  • Plasmid encoded: For Protein rules only, indicates the 5-letter code of the organism in which the protein is encoded on a plasmid.

  • Repeats: For Domain and Site rules only, indicates the expected number (single number, a range, or unlimited) of repetitions of a domain or site in rule matches.

  • Topology: For Domain/Site rules only, specifies the subcellular location(s) in which a Domain or Site may occur.

  • Example(s): Optional for Protein rules and mandatory for Domain and Site rules. One or more example entries targeted by the rule are indicated.

  • Comments on the rule: This optional section contains additional useful information including: 5-letter codes of organisms with possible wrong starts, divergent paralogs, proteins that are excluded from alignment due to anomalies, etc.

UniProtKB rule member sequences

UniProtKB sets

The count of rule members is indicated for each major kingdom (i.e. Archaea, Bacteria, Eukaryota and Viruses). By clicking on each count you can access a detailed list of:
  • the UniProtKB match entries (for Protein rules)
  • the UniProtKB/Swiss-Prot match entries (for Domain rules)

Retrieve set of characterized or identified proteins for this family

For Protein rules only.
This section allows to get proteins with evidence at protein level and/or at transcript level. For more details about the criteria used to define Protein existence please refer to the document pe_criteria.txt

Retrieve set of proteins with 3D structure

This section allows to get proteins whose three-dimensional structure (for Protein rules) or part of it (for Protein rules and Domain rules) has been resolved experimentally (for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are available in the Protein Data Bank (PDB) database.