{PDOC50099} {PS50310; ALA_RICH} {PS50311; CYS_RICH} {PS50312; ASP_RICH} {PS50313; GLU_RICH} {PS50314; PHE_RICH} {PS50315; GLY_RICH} {PS50316; HIS_RICH} {PS50317; ILE_RICH} {PS50318; LYS_RICH} {PS50319; LEU_RICH} {PS50320; MET_RICH} {PS50321; ASN_RICH} {PS50099; PRO_RICH} {PS50322; GLN_RICH} {PS50323; ARG_RICH} {PS50324; SER_RICH} {PS50325; THR_RICH} {PS50326; VAL_RICH} {PS50327; TRP_RICH} {PS50328; TYR_RICH} {BEGIN} ***************************************************************** * Sequence regions enriched in a particular amino acid profiles * ***************************************************************** Many proteins contain compositionally biased sequence regions which are also called low-complexity regions [1]. Typically, such regions are highly enriched in one or a few amino acids. We have included profiles specific for each of the 20 amino acids so as to search for regions that are significantly enriched in a particular amino acid. The behaviour of these profiles is controlled by two parameters, the match and mismatch scores. These parameters were chosen such that the "target frequencies" of the corresponding amino acids computed according to the Karlin-Altschul theory [2] approximate 35% for the residue composition of Swiss-Prot (see below). Amino Average Match Mismatch Target acid frequency score score frequency Ala (A) 7.55 4 -1 38.5 Cys (C) 1.69 7 -1 36.8 Asp (D) 5.30 5 -1 35.1 Glu (E) 6.32 5 -1 32.4 Phe (F) 4.07 6 -1 31.9 Gly (G) 6.84 5 -1 31.2 His (H) 2.24 7 -1 33.6 Ile (I) 5.72 5 -1 34.0 Lys (K) 5.93 5 -1 33.4 Leu (L) 9.33 4 -1 34.7 Met (M) 2.35 7 -1 33.1 Asn (N) 4.52 5 -1 37.4 Pro (P) 4.92 5 -1 36.2 Gln (Q) 4.02 6 -1 32.1 Arg (R) 5.15 5 -1 35.5 Ser (S) 7.22 4 -1 39.2 Thr (T) 5.74 5 -1 33.9 Val (V) 6.52 5 -1 32.0 Trp (W) 1.25 8 -1 34.9 Tyr (Y) 3.19 6 -1 35.1 The normalisation parameters for converting raw scores into per-residue log expectation values, which are given within the profile, were empirically derived by fitting an extreme value distribution to the score distribution obtained from a random database that conserves the length distribution and global amino acid composition of Swiss-Prot but not the composition of the individual sequences. -Note: These profiles do not characterize biologically defined objects. As the underlying definition is purely statistical, it is not possible to speak of true or false matches to these profiles, neither is it possible to assign a false negative status to a sequence. -Expert(s) to contact by email: Bucher P.; Philipp.Bucher@sib.swiss -Last update: April 2002 / First entry. [ 1] Wootton J.C., Federhen S. "Analysis of compositionally biased regions in sequence databases." Methods Enzymol. 266:554-571(1996). PubMed=8743706 [ 2] Karlin S., Bucher P., Brendel V., Altschul S.F. "Statistical methods and insights for protein and DNA sequences." Annu. Rev. Biophys. Biophys. Chem. 20:175-203(1991). PubMed=1867715 -------------------------------------------------------------------------------- PROSITE is copyrighted by the SIB Swiss Institute of Bioinformatics and distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND 4.0) License, see https://prosite.expasy.org/prosite_license.html -------------------------------------------------------------------------------- {END}