Inteins (for INternal proTEINs) are protein insertion sequences that are
embedded in host protein sequences. They are post-translationally excised from
the host protein by a self-catalytic protein splicing process, in which the
intein sequence is precisely excised, and the flanking host protein sequences
(N- and C-exteins) are religated to create a functional protein. Intein and
protein splicing may be viewed as the protein equivalent of intron and RNA
splicing, respectively. Inteins were initially discovered as translated
intervening sequences that were present in the host gene but absent in
homologous genes. Inteins occur in organisms spanning all three kingdoms of
life (eubacteria, archaea and eukaryote). Although many inteins are in host
proteins involved in nucleic acid metabolism, several inteins are located in
metabolic enzymes, such as phosphoenolpyruvate synthase, anaerobic
ribonucleoside triphosphate reductase, UDP-glucose dehydrogenase, ClpP
protease/chaperone, vacuolar ATPase proton pump (VMA) and glutamine-fructose
6-phosphate transaminase. It should be noted that protein splicing can also
occur in trans as in Synechocystis sp. PCC 6803, where the replicative DNA
polymerase catalytic subunit (DnaE) is generated from two separate precursor
Most inteins are bifunctional proteins mediating both protein splicing and DNA
cleavage. The domain involved in splicing is formed by the two terminal
splicing regions, which are separated by a small linker in mini-inteins or a
homing endonuclease of 200-250 amino acids in larger inteins (see <PDOC50819>)
[1,4]. The N-terminal splicing region spans the about 100 N-terminal amino
acids and contains the conserved intein blocks A and B which are similar to
the motifs found in the C-terminal autoprocessing domain of the hedgehog
protein. The C-terminal splicing region is composed of the two conserved
blocks F and G located in the about 50 C-terminal amino acids. Although, no
single residue is invariant, the Ser and Cys in block A, the His in block B,
the His, Asn and Ser/Cys/Thr in block G are the most conserved residues in the
splicing motifs. Protein splicing requires neither cofactors nor auxiliary
enzymes and involves a series of four intramolecular reactions in which
several of these most conserved residues are implicated [1,3,E1].
Resolution of the crystal structure of the Mxe GyrA mini-intein (see
<PDB:1AM2>) revealed a flattened 'horseshoe shaped' protein composed primarily
of β-strands forming two homologous subdomains that are related by a pseudo
twofold axis of symmetry. Despite a low level of sequence conservation, the
two subdomains are nearly superimposable, suggesting that they could have
arisen by tandem duplication of a primordial gene. However, the duplicated
sequences do not correspond directly to the two subdomains as the two
subdomains have exchanged homologous loop regions [1,2,5,6].
The first profile we have developed is directed against the N-terminal
splicing region and covers the intein blocks A and B. It starts with the first
N-terminal amino acid of the intein.
The second profile we have developed is directed against the C-terminal
splicing region and covers the intein blocks F and G. It extends to the first
extein residue following the intein.
May 2002 / Text revised, patterns removed and profiles added.
PROSITE methods (with tools and information) covered by this documentation:
Protein-splicing intein: Genetic mobility, origin, and evolution.
PROSITE is copyright. It is produced by the SIB Swiss Institute
Bioinformatics. There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement. For information
about the licensing scheme send an email to
or see: prosite_license.html.