PROSITE documentation PDOC50099
Sequence regions enriched in a particular amino acid profiles

View entry in original PROSITE document format
View entry in raw text format (no links)
PURL: https://purl.expasy.org/prosite/documentation/PDOC50099

Description

Many proteins contain compositionally biased sequence regions which are also called low-complexity regions [1]. Typically, such regions are highly enriched in one or a few amino acids. We have included profiles specific for each of the 20 amino acids so as to search for regions that are significantly enriched in a particular amino acid (alanine, cysteine, aspartic acid, glutamic acid, phenylalanine, glycine, histidine, isoleucine, lysine, leucine, methionine, asparagine, proline, glutamine, arginine, serine, threonine, valine, tryptophan and tyrosine-rich region profiles). The behaviour of these profiles is controlled by two parameters, the match and mismatch scores. These parameters were chosen such that the "target frequencies" of the corresponding amino acids computed according to the Karlin-Altschul theory [2] approximate 35% for the residue composition of Swiss-Prot (see below).

   Amino     Average    Match    Mismatch   Target
   acid      frequency  score    score      frequency

   Ala (A)   7.55          4       -1       38.5
   Cys (C)   1.69          7       -1       36.8
   Asp (D)   5.30          5       -1       35.1
   Glu (E)   6.32          5       -1       32.4
   Phe (F)   4.07          6       -1       31.9
   Gly (G)   6.84          5       -1       31.2
   His (H)   2.24          7       -1       33.6
   Ile (I)   5.72          5       -1       34.0
   Lys (K)   5.93          5       -1       33.4
   Leu (L)   9.33          4       -1       34.7
   Met (M)   2.35          7       -1       33.1
   Asn (N)   4.52          5       -1       37.4
   Pro (P)   4.92          5       -1       36.2
   Gln (Q)   4.02          6       -1       32.1
   Arg (R)   5.15          5       -1       35.5
   Ser (S)   7.22          4       -1       39.2
   Thr (T)   5.74          5       -1       33.9
   Val (V)   6.52          5       -1       32.0
   Trp (W)   1.25          8       -1       34.9
   Tyr (Y)   3.19          6       -1       35.1

The normalisation parameters for converting raw scores into per-residue log expectation values, which are given within the profile, were empirically derived by fitting an extreme value distribution to the score distribution obtained from a random database that conserves the length distribution and global amino acid composition of Swiss-Prot but not the composition of the individual sequences.

Note:

These profiles do not characterize biologically defined objects. As the underlying definition is purely statistical, it is not possible to speak of true or false matches to these profiles, neither is it possible to assign a false negative status to a sequence.

Expert(s) to contact by email:

Bucher P.

Last update:

February 2024 / Text revised.

-------------------------------------------------------------------------------

Technical section

PROSITE methods (with tools and information) covered by this documentation:

ALA_RICH, PS50310; Alanine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50310

ARG_RICH, PS50323; Arginine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50323

ASN_RICH, PS50321; Asparagine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50321

ASP_RICH, PS50312; Aspartic acid-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50312

CYS_RICH, PS50311; Cysteine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50311

GLN_RICH, PS50322; Glutamine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50322

GLU_RICH, PS50313; Glutamic acid-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50313

GLY_RICH, PS50315; Glycine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50315

HIS_RICH, PS50316; Histidine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50316

ILE_RICH, PS50317; Isoleucine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50317

LEU_RICH, PS50319; Leucine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50319

LYS_RICH, PS50318; Lysine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50318

MET_RICH, PS50320; Methionine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50320

PHE_RICH, PS50314; Phenylalanine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50314

PRO_RICH, PS50099; Proline-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50099

SER_RICH, PS50324; Serine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50324

THR_RICH, PS50325; Threonine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50325

TRP_RICH, PS50327; Tryptophan-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50327

TYR_RICH, PS50328; Tyrosine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50328

VAL_RICH, PS50326; Valine-rich region profile (MATRIX with a high probability of occurrence!)

Scan UniProtKB (Swiss-Prot and/or TrEMBL) entries against PS50326

References

1	Authors	Wootton J.C. Federhen S.
	Title	Analysis of compositionally biased regions in sequence databases.
	Source	Methods Enzymol. 266:554-571(1996).
	PubMed ID	8743706

2	Authors	Karlin S. Bucher P. Brendel V. Altschul S.F.
	Title	Statistical methods and insights for protein and DNA sequences.
	Source	Annu. Rev. Biophys. Biophys. Chem. 20:175-203(1991).
	PubMed ID	1867715

PROSITE is copyrighted by the SIB Swiss Institute of Bioinformatics and distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND 4.0) License, see prosite_license.html.

PROSITE documentation PDOC50099Sequence regions enriched in a particular amino acid profiles

PROSITE documentation PDOC50099
Sequence regions enriched in a particular amino acid profiles