Due to scheduled maintenance work, this service will not be available from Tuesday August 23rd 06:00 pm until Wednesday August 24th 08:00 am CEST. Apologies for the inconvenience.
Many proteins contain compositionally biased sequence regions which are also
called low-complexity regions [1]. Typically, such regions are highly enriched
in one or a few amino acids. We have included profiles specific for each of
the 20 amino acids so as to search for regions that are significantly enriched
in a particular amino acid. The behaviour of these profiles is controlled by
two parameters, the match and mismatch scores. These parameters were chosen
such that the "target frequencies" of the corresponding amino acids computed
according to the Karlin-Altschul theory [2] approximate 35% for the residue
composition of Swiss-Prot (see below).
Amino Average Match Mismatch Target
acid frequency score score frequency
Ala (A) 7.55 4 -1 38.5
Cys (C) 1.69 7 -1 36.8
Asp (D) 5.30 5 -1 35.1
Glu (E) 6.32 5 -1 32.4
Phe (F) 4.07 6 -1 31.9
Gly (G) 6.84 5 -1 31.2
His (H) 2.24 7 -1 33.6
Ile (I) 5.72 5 -1 34.0
Lys (K) 5.93 5 -1 33.4
Leu (L) 9.33 4 -1 34.7
Met (M) 2.35 7 -1 33.1
Asn (N) 4.52 5 -1 37.4
Pro (P) 4.92 5 -1 36.2
Gln (Q) 4.02 6 -1 32.1
Arg (R) 5.15 5 -1 35.5
Ser (S) 7.22 4 -1 39.2
Thr (T) 5.74 5 -1 33.9
Val (V) 6.52 5 -1 32.0
Trp (W) 1.25 8 -1 34.9
Tyr (Y) 3.19 6 -1 35.1
The normalisation parameters for converting raw scores into per-residue log
expectation values, which are given within the profile, were empirically
derived by fitting an extreme value distribution to the score distribution
obtained from a random database that conserves the length distribution and
global amino acid composition of Swiss-Prot but not the composition of the
individual sequences.
Note:
These profiles do not characterize biologically defined objects. As the
underlying definition is purely statistical, it is not possible to speak of
true or false matches to these profiles, neither is it possible to assign a
false negative status to a sequence.
PROSITE is copyrighted by the SIB Swiss Institute of Bioinformatics and
distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives
(CC BY-NC-ND 4.0) License, see prosite_license.html.