ScanProsite - user manual

ScanProsite allows to scan proteins for matches against the PROSITE collection of motifs as well as against user-defined patterns.

At the beginning the user has to choose between three options:

Option 1 - Submit PROTEIN sequences to scan them against the PROSITE collection of motifs .
Option 2 - Submit MOTIFS to scan them against a PROTEIN sequence database .
Option 3 - Submit PROTEIN sequences and MOTIFS to scan them against each other.

Quick Scan

The Quick Scan mode of ScanProsite corresponds to a simplified version of 'Option 1 - Submit PROTEIN sequences to scan them against the PROSITE collection of motifs ' that is available from the PROSITE homepage.
Enter or paste up to 10 protein sequences in the textarea.
The accepted inputs are:

UniProtKB accessions e.g. P98073 or identifiers e.g. ENTK_HUMAN^*
PDB identifiers e.g. 4DGJ
Sequences in FASTA format

* All UniProtKB/Swiss-Prot accessions/identifiers and all UniProtKB/TrEMBL accessions/identifiers of entries belonging to a reference proteome are accepted.

Your input sequences will be scanned against all PROSITE motifs including or excluding the ones with a high probability of occurrence (see the Exclude motifs with a high probability of occurrence option) depending of whether you check (exclude) or uncheck (include) the checkbox below the textarea.
Once the scan carried out, the results will be displayed in the ' Graphical view ' output format.

Main operations

Submit PROTEIN sequences

You can either enter or paste protein sequences in the textarea or submit a protein database.
Enter or paste up to protein sequences in the textarea.
The accepted inputs are:

UniProtKB accessions e.g. P98073 or identifiers e.g. ENTK_HUMAN^*
PDB identifiers e.g. 4DGJ
Sequences in FASTA format

* All UniProtKB/Swiss-Prot accessions/identifiers and all UniProtKB/TrEMBL accessions/identifiers of entries belonging to a reference proteome are accepted.

If your in 'Option 1' (scan against all PROSITE motifs), the maximum number of sequences that you can submit is 10; while if your in 'Option 3' (scan against specified motifs) the maximum number of sequence you can enter is 1'000 if you submit 1 motif and 50 if you submit a combination of motifs.

If you want the scan to be carried out against your own sequence database either enter a database code or submit a file in FASTA (max. 16MB). Once your file uploaded, you will receive a code that you can use for repeated scans on the database you've just submitted, the database will remain on our server for a period of 1 month.

Submit MOTIFS (Enter a MOTIF or a combination of MOTIFS)

Enter a motif or a combination of motifs in the textarea, the supported input is:

A PROSITE accession e.g. PS50240 or identifier e.g. TRYPSIN_DOM
Your own pattern e.g. P-x(2)-G-E-S-G(2)-[AS]
A combination of PROSITE accessions/identifiers e.g. PS50240 and PS50068, e.g. PS50240 and not ( PS00134 or PS00135 )
A combination of PROSITE accessions/identifiers and your own pattern e.g. PS50240 and P-x(2)-G-E-S-G(2)-[AS]

Then you have the possibility to modify a couple of default scanning parameters ( scanning options )

Mimimal number of hits per matched sequences (only in 'Option 2')
Run the scan at high sensitivity (show weak matches for profiles)
Number of X characters in a scanned sequence that can be matched by a conserved position in a pattern
Match mode

Pattern syntax

The standard IUPAC one letter code for the amino acids is used in PROSITE.
The symbol 'x' is used for a position where any amino acid is accepted.
Ambiguities are indicated by listing the acceptable amino acids for a given position, between square brackets '[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for all any amino acid except Ala and Met.
Each element in a pattern is separated from its neighbor by a '-'.
Repetition of an element of the pattern can be indicated by following that element with a numerical value or, if it is a gap ('x'), by a numerical range between parentheses.
Examples:
- x(3) corresponds to x-x-x
- x(2,4) corresponds to x-x or x-x-x or x-x-x-x
- A(3) corresponds to A-A-A
When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern respectively starts with a '<' symbol or ends with a '>' symbol.
In some rare cases (e.g. PS00267 or PS00539 ), '>' can also occur inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' is equivalent to 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>'.

Note:

Ranges can only be used with with 'x', for instance 'A(2,4)' is not a valid pattern element.
Ranges of 'x' are not accepted at the beginning or at the end of a pattern unless resticted/anchored to respectively the N- or C-terminal of a sequence, for instance 'P-x(2)-G-E-S-G(2)-[AS]-x(0,200)' is not accepted but 'P-x(2)-G-E-S-G(2)-[AS]-x(0,200)>' is.

Extended syntax for ScanProsite:

If your pattern does not contain any ambiguous residues, you don't need to specify separation with '-'.
Example: M-A-S-K-E can be written as MASKE.
It means that in such a case you can directly copy/paste peptide sequences into the textfield.
To search all sequences which do not contain a certain amino acid, e.g. Cys, you can use <{C}*>.

You can use the program PRATT to generate your own pattern.

Pattern	Explanation
[AC]-x-V-x(4)-{ED}	[Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}
<A-x-[ST](2)-x(0,1)-V	Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val at the N-terminal of the sequence
<{C}*>	No Cys from the N-terminal to the C-terminal i.e. All sequences that do not contain any Cys.
IIRIFHLRNI	Ile-Ile-Arg-Ils-Phe-His-Leu-Arg-Asn-Ile

Combination of MOTIFS

You can submit multiple motifs at the same time. The upper limit is 8 motifs for a scan against a protein database (Option 2 - Step 1) and 16 for a scan against specified sequences (Option 3 - Step 2).
You can use logical operators: 'and', 'or' and 'not' with parentheses if needed.

Examples of logical expressions
PS50240 PS50068
PS50240 and PS50068
PS50240 and P-x(2)-G-E-S-G(2)-[AS]
PS50240 and not PS50068
PS50240 and ( PS00134 or PS00135 )
PS50240 and not ( PS00134 or PS00135 )

The 'or' is implicit which means that for instance 'PS50240 PS50068' is equivalent to 'PS50240 or PS50068' if you want to look for sequences matched by both PS50240 and PS50068, you must use 'PS50240 and PS50068'.
(Innermost) parentheses are handled first.
The 'not' is right associative, which means that what's on ther right of the 'not' is evaluated before the 'not'.
The 'and' and 'or' are left associative, which means that what's on the left of an 'and' or an 'or' is evaluated before the 'and' or 'or'.
A root 'not' like in 'not PS50240' is not allowed because it would give too many matches.
If you use parentheses, put a space before and after each of them. For instance 'PS50240 and not ( PS00134 or PS00135 )' is correct while 'PS50240 and not (PS00134 or PS00135)' is wrong.
If you use logical operators, all your expressions must be explicit, i.e. you cannot use white spaces standing for 'or'. For instance 'PS50240 and not ( PS00134 or PS00135 )' is correct while 'PS50240 and not ( PS00134 PS00135 )' is wrong.

Select a PROTEIN sequence database

Select between these PROTEIN sequences databases

UniProtKB Swiss-Prot and/or TrEMBL^*
PDB
Your own sequence database
Randomized UniProtKB/Swiss-Prot : reversed or window20

*For UniProtKB/TrEMBL, only entries belonging to reference proteomes are included in the set.

If you want the scan to be carried out against your own sequence database either enter a database code or submit a file in FASTA (max. 16MB). Once your file uploaded, you will receive a code that you can use for repeated scans on the database you've just submitted, the database will remain on our server for a period of 1 month.

Randomized UniProtKB/Swiss-Prot

It is often useful to be able to search a pattern against a random database in order to evaluate its specificity. It is desirable for that database not to be completely random, but comparable to the databases which are to be scanned in terms of amino acid frequency and local compositional bias. ScanProsite can randomize scanned databases on the fly, using one of two methods:

reverse: reverse sequences - created by taking the reverse sequence of each individual entry.
window20: shuffled sequences - created by local shuffling of each individual sequence entry using a window width of 20 residues

The reverse sequences method is generally recommendable, but it is not adapted for patterns which are strongly enriched in one amino acid e.g. C-C-C-[LIV] or palindromic ones e.g. M-L-L-M.

Note: Scanning a randomized sequence database only makes sense against patterns.

Filters

Filter	Usage	Database application
length >= than	Specifies a minimal length Must be a positive integer or zero, e.g. 150	UniProtKB (Swiss-Prot and TrEMBL) and PDB
length <= than	Specifies a maximal length Must be a positive integer, e.g. 500	UniProtKB (Swiss-Prot and TrEMBL) and PDB
Taxonomy	Enter a taxonomical term e.g. 'Homo sapiens', e.g. 'Fungi; Arthropoda' or corresponding NCBI TaxID e.g. 9606, e.g. '4751; 6656' that you can obtain from the NCBI or the UniProt taxonomy databases. Multiple terms must be separated by a semicolon.	UniProtKB (Swiss-Prot and TrEMBL)

Scanning options

	Description	Default value
Exclude motifs with a high probability of occurrence	Does not scan against motifs with a high probability of occurrence.	On
Exclude profiles	Does not scan against profiles. => Scans only against patterns.	Off
Run the scan at high sensitivity	Runs the scan at a low level (shows weak matches). Concerns profiles only.	Off
Minimal number of hits per matched sequence	Defines how many hits there must be in a sequence for the matched sequence to be displayed.	1
Match mode	Defines the match mode for pattern matching. Concerns patterns only.	Greedy, overlaps, no includes

Exclude motifs with a high probability of occurrence

Description	Default value
Does not scan against patterns with a high probability of occurrence. Concerns patterns only.	On

Motifs with a high probability of occurrence are in most cases patterns that are found in many protein sequences. Some of them describe for example commonly found post-translational modifications and some others compositionally biased regions.
While it is generally useful to note their presence, some programs may want, in some cases, to ignore those entries. For this purpose these entries are indicated with the following qualifier in their CC lines: '/SKIP-FLAG=TRUE>;', like in the following entry:


        ID   ASN_GLYCOSYLATION; PATTERN.

        AC   PS00001;

        DT   APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE).

        DE   N-glycosylation site.

        PA   N-{P}-[ST]-{P}.

        CC   /SITE=1,carbohydrate;

        CC   /SKIP-FLAG=TRUE;

        CC   /VERSION=1;

        PR   PRU00498;

        DO   PDOC00001;

        //

Matches by frequently occuring motifs are displayed under 'hits by patterns/profiles with a high probability of occurrence' if the output format is 'Graphical view'. If the output format is 'Simple view' or 'Text', each motif accession number is tagged with '[occurs frequently]'.

Exclude profiles

Description	Default value
Does not scan against profiles. => Scans only against patterns.	Off

Run the scan at high sensitivity

Description	Default value
Runs the scan at a low level (shows weak matches). Concerns profiles only.	Off

PROSITE profiles normally use two cut-off levels, a reliable cut-off (LEVEL=0) and a low confidence cut-off (LEVEL=-1) [ more ].

Runs the scan at a low confidence cut-off (LEVEL=-1) and hence shows matches that are below the the reliable cut-off (LEVEL=0).
Weak hits are tagged with '[warning: hit with a low confidence level (-1)]' if the output format is 'Graphical view' and '[low confidence]' if the output format is 'Simple view' or 'Text'.

Minimal number of hits per matched sequence

Description	Default value
Defines how many hits there must be in a sequence for the matched sequence to be displayed.	1

Match mode

Three parameters allow to finely tune the behaviour of the pattern-matching engine:

parameter	action
greed	extends at most variable-length pattern elements
overlap	allows partially overlapping matches
include	allows matches included within one another (implies overlap)

The default behavior is greedy, allows overlaps but not included matches. This means that two overlapping matches are rejected if one is entirely contained within the other.
For example, consider the sequence "ABACADAEAFA" and the simple pattern "A-x(1,3)-A". The six possible combinations of the switches produce the following results:

greed=1, overlap=1, include=0 (default) : 4 matches


  ABACADAEAFA

  ooooo......

  ..ooooo....

  ....ooooo..

  ......ooooo

greed=1, overlap=1, include=1 : 5 matches


  ABACADAEAFA

  ooooo......

  ..ooooo....

  ....ooooo..

  ......ooooo

  ........ooo

greed=1, overlap=0 : 2 matches


  ABACADAEAFA

  ooooo......

  ......ooooo

greed=0, overlap=1, include=0 or 1 : 5 matches


  ABACADAEAFA

  ooo........

  ..ooo......

  ....ooo....

  ......ooo..

  ........ooo

greed=0, overlap=0 : 3 matches


  ABACADAEAFA

  ooo........

  ....ooo....

  ........ooo

Output formats

Graphical view

HTML view with a graphical representation of hits on proteins (as downloadable images) and prediction (for certain profiles) of features inside matches.

Graphical view

This Web tool displays for each hit within a protein sequence: the hit sequence, the score (for hits against a profile), the PROSITE description and link. In addition, if predicted; biological features associated with each matched sequence are also indicated.
Results are separated into different kinds of hits: hits by 'profiles', 'profiles with a high probability of occurrence', 'patterns', 'patterns with a high probability of occurrence or 'user-defined patterns'. Inside each of these categories, hits by protein are sorted by their N-ter position but multiple hits against a similar motif are grouped together.
In addition for each matched protein, a graphical view in form of a downloadable png (Portable Network Graphics) image represents all its matches (of the aforementioned type) and detected features. Profile hits are represented as colored shapes with their PROSITE name; pattern hits are shown (separated) as thin colored bars without text.
If a match overlaps with the previous one, it will be shown on a different line or if the overlap size is smaller than 10% of the match size, the match will be shown on the same line, its overlapping start will be truncated and replaced by a vertical red bar (indicating that there is a small overlap).

Biological features:
For certain profiles, additional biologically meaningful information about residues inside matches is defined. This additional information comes from the mapping of biologically meaningful residues to PROSITE profiles. It is used to make functional/structural predictions of profile matches more accurate (as profiles show enhanced sensitivity over patterns, but because of their relaxed stringency loose functional/structural discriminativity).
If certain conditions expected for the functional and/or structural properties associated with the domain are fulfilled the properties are shown as 'Predicted features'. For each feature, the UniProtKB feature key , the position/range, the feature description (if any), and the condition that triggered the detection are shown.
Conditions can be specific amino acid inside hit, group of sub-conditions in which all conditions must be true in order for the group condition to be true, case between different sub-conditions/groups etc...
Features associated with conditions that were not fulfilled are shown as 'Absent features' in the same way as for predicted ones except that condition here shows why the feature has not been detected (condition/case not true and/or incomplete group).
On the graphical view, features are shown on top of hits; depending on their type as bridges, horizontal bars, vertical pins.

Graphical view legend

Individual view:
For a scan of more than one sequence against all PROSITE motifs (Option 1), you can click on 'individual view' next to the graphical display so as to see only hits against the protein sequence in question.

View all PROSITE motifs hits on sequence:
For a scan of specific sequences against specific motifs (Option 3), you can click on 'View all PROSITE motifs hits on sequence' in order to sea all PROSITE motifs matches against the protein in question (except for the ones with a high probability of occurrence and at a regular level of sensitivity for profile matches).

Match/sequence highlighting:
When hits for only one protein are shown, and if you have a Mozilla based web browser (Mozilla, FireBird/Fox, Netscape 7) you'll be able to see feature residues highlighted (green for predicted features, gray for absent features) on both the match and the full protein sequence (if shown) when you move your mouse cursor over a feature line. In addition if the full sequence of the protein is shown (if you click on 'Individual view' or 'View all PROSITE motifs hits on sequence' or if you submitted only one protein), the match region in the protein sequence will be highlighted in yellow when you move your mouse cursor over that match in the graphical view or the text view.
Highlights are persistent as long as you don't move your cursor over another match/feature (note that left/right margins are immune to cursor moves).

Simple view

Simple HTML view of results without graphical representation of hits and feature prediction.

Text

Text-only view (without any html link).

FASTA

Text only view, in FASTA format, each hit is shown as a FASTA sequence where the sequence header/name is:
[the matched protein]/[hit start]-[hit stop]/[the matching PROSITE motif]/the score (only for profiles)/the confidence level (if any).
Note: If 'Retrieve complete sequence' is selected, the complete protein sequence replaces the matched sequence and only one hit per matched sequence is represented.

Table

Text view containing for each hit on a sequence:
[the matched protein] [hit start] [hit stop] [the matching PROSITE motif] [the score (only for profiles)] [the confidence level (if any)] [the matched region]
Note: If 'Retrieve complete sequence' is selected, the complete protein sequence replaces the matched sequence and only one hit per matched sequence is represented.

Match list

List of matches (UniProtKB accessions if you submitted UniProtKB accessions or identifiers, PDB identifiers if you submitted PDB identifiers, first space delimited word of the FASTA header if you submitted FASTA sequences).

Miniprofiles

PROSITE pattern hits are validated by automatically generated 'miniprofiles' that assign a status to pattern matches.

Most PROSITE patterns have an associated miniprofile. Miniprofiles are stored in evaluator.dat and their accession number (AC) is the same as the pattern from which they originate except for the replacement of 'PS' by 'MP'. Example: the miniprofile for 'PS00134' is 'MP00134'.
When there's a hit by a given pattern, the sequence is scanned against the pattern's associated miniprofile: if the miniprofile also matches the region matched by the pattern, credit is added to the relevance of the pattern's match.

The table below shows, for each output format, what is displayed when the pattern's hit is also matched or respectively not matched by the pattern's associated miniprofile.

Output format	matched by miniprofile	not matched by miniprofile
Graphical view	confidence level: (0)	confidence level: (-1)
Simple view	confidence level: (0)	confidence level: (-1)
Text view	confidence level: (0)	confidence level: (-1)
FASTA	(0)	(-1)
Table	(0)	(-1)
Matchlist	/	/

For more information on miniprofiles, please consult " The 20 years of PROSITE ".

Output options

Maximum number of displayed matches

The maximum number of distinct matched proteins that can be shown in the output.
This number is by default set to 10'000. If you choose 100'000 the results won't be shown in your web browser as a security measure to prevent too much data being send to your browser, you will then have to submit an email address for the results to be sent to you by email.

Retrieve complete sequences

Adds the complete protein sequence to the information displayed for each matched protein.
This option limits the choices of output formats to 'Simple view', 'Text', 'FASTA' and 'Table'; it also limits the 'Maximum number of displayed matches' to 1'000.
Note: For the output formats 'FASTA' and 'Table', the complete protein sequence replaces the matched sequence and only one hit per matched sequence is represented.

Email and job title

Results returned by email limits the choice of output format to 'Text', 'FASTA', 'Table' and 'Matchlist'.
If the chosen 'Maximum number of displayed matches' is 1'000, results have to be send by email and a valid email address is then required. In other situations ScanProsite ignores what you've entered in the email textbox unless it is a valid email address.

Job title: If you've entered a valid email address and you fill in this field, the 'Job title' will appear in the subject of the email you receive for that job.

Programmatic acces: REST web service

REST introduction

REST: REpresentational State Transfer

REST originally referred to a collection of architectural principles, but now the acronym is often coined to describe any simple web-based interface for programmatic access that uses XML (or YAML, JSON, plain text) over HTTP without the extra abstractions of MEP-based approaches like the web services SOAP protocol.
The 'naked' data, without any envelope is retrieved as the content of the HTTP query response.
The options for the operation to be performed are part of the HTTP query parameters, the target URL representing the resource being accessed.
The REST philosophy also implies using HTTP 'verbs' (PUT, GET, POST, DELETE) to perform distinct operations (respectively: Create, Read, Update, Delete) on the target resources (url).
For more information on REST, consult the the Wikipedia REST article .

For ScanProsite, as it is a scanning tool, some of the resources are provided by the users (sequences or/and patterns); to minimize the number of required queries / simplify the system, the service doesn't fully follow aforementioned REST principles (that would be e.g. PUTing the user resources on the server first, then GETing the scan results). Instead users directly POST/GET all their data to get the scan results in the response (n.b. direct system; no ticket/job id: do increase connection time-out for complex queries).
Note: in the ScanProsite service, POST is not used to update data, but like GET, just to (pass input data and parameters and) read scan result data.

REST usage for ScanProsite

Make an HTTP GET or POST query to the service; retrieve scan output data (in XML or JSON) in the HTTP response content.

e.g. (GET) just query for: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=ENTK_HUMAN&output=xml

Service url: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi

Parameters:

GET or POST parameters (name, description):

Name	Correspondence in ScanProsite form )	Description
seq	Submit PROTEIN sequences	Sequence(s) to be scanned: UniProtKB accessions e.g. P98073 or identifiers e.g. ENTK_HUMAN^* or PDB identifiers e.g. 4DGJ or sequences in FASTA format. Do not repeat parameter; multiple sequences can be specified by separating them with new lines (%0A in url). 'seq' takes precedence over 'db', i.e. that if they're both specified, 'db' will be ignored. For UniProtKB/TrEMBL accessions and identifiers, only the ones of entries belonging to references proteomes are accepted. Default:* seq="" (empty) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=P98073 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=ENTK_HUMAN https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=>ENTK_HUMAN_in_FASTA_%0AMGSKRGISSR... https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=4DGJ https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=P98073%0AQ3SYW2%0AQ867B7%0AP23604%0AQ04962%0AH2QKV6%0AA5PF02%0AF6ZWI6%0AQ56IB8 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=>ENTK_HUMAN_in_FASTA_%0AMGSKRGISSR...%0A>CO2_BOVIN_in_FASTA_%0AMDPLMAVLCL...
db	Select a PROTEIN sequence database	Target protein database for scans of motifs against whole protein databases: 'sp' (UniProtKB/Swiss-Prot) or 'tr' (UniProtKB/TrEMBL reference proteomes sequences) or 'pdb' (PDB). 'seq' takes precedence over 'db', i.e. that if they're both specified, 'db' will be ignored. Default: db=sp (if no "seq" and no "db" are specified, the scan is carried out agains UniProKB/Swiss-Prot) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS00134&output=txt https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS00134&output=txt&db=sp https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS00134&output=txt&db=pdb
varsplic	Include isoforms	If on (varsplic=1): includes UniProtKB/Swiss-Prot splice variants. Only relevant on scans against UniProtKB/Swiss-Prot. Default: varsplic=0 (off, UniProtKB/Swiss-Prot splice variants are not scanned) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50068&output=list https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50068&output=list&varsplic=0 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50068&output=list&varsplic=1
sig	Enter a MOTIF or a combination of MOTIFS	Motif(s) to scan against: PROSITE accession e.g. PS50240 or identifier e.g. TRYPSIN_DOM or your own pattern e.g. P-x(2)-G-E-S-G(2)-[AS]. Combinations of motifs can also be used. If not specified, all PROSITE motifs are used. Do not repeat parameter; multiple motifs can be specified by separating them with new lines (%0A in url). Default: sig="" (empty) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&sig=PS50240 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&sig=TRYPSIN_DOM https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&sig=P-x(2)-G-E-S-G(2)-[AS] https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&sig=PS50240%20and%20PS50068 (sig=PS50240 and PS50068) https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&sig=PS50240%20and%20not%20(%20PS00134%20or%20PS00135%20) (sig=PS50240 and not ( PS00134 or PS00135 )) https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?output=html&varsplic=1&varsplic=1&sig=PS50240%20and%20P-x(2)-G-E-S-G(2)-[AS] (sig=PS50240 and P-x(2)-G-E-S-G(2)-[AS])
lineage	Filters On taxonomy	Any taxonomical term e.g. 'Homo sapiens', e.g. 'Fungi%3BArthropoda' or corresponding NCBI TaxID e.g. 9606, e.g. '4751%3B6656' Separate multiple terms with a '%3B'. Only works on scans against UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. Default: lineage="" (empty) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50240&output=fasta https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50240&output=fasta&lineage=Homo%20sapiens (lineage=Homo sapiens) https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50240&output=fasta&lineage=9606 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50240&output=fasta&lineage=Fungi%3BArthropoda (lineage=Fungi;Arthropoda) https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50240&output=fasta&lineage=4751%3B6656 (lineage=4751;6656)
max_x	Number of X characters in a scanned sequence that can be matched by a conserved position in a pattern	Number of X characters in a scanned sequence that can be matched by a conserved position in a pattern. Only relevant if 'sig' is defined and is a pattern. Default: max_x=0 (no X character in a scanned sequence that can be matched by a conserved position in a pattern)
output	Output format	txt, xml, json, nice, html, plain, fasta, tabular, list Default: output=plain Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=plain https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=txt https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=xml https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=json https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=list https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=tabular https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=fasta https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=html https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=H2QKV6_PANTR&output=nice
skip	Exclude motifs with a high probability of occurrence from the scan	If on (defined, non empty, non zero): excludes motifs with a high probability of occurrence. Only relevant if 'seq' is defined and 'sig' is not defined, i.e. on scans of specific sequence(s) against all PROSITE motifs. Default: skip=1 (on, PROSITE motifs with are high probability of occurrences are excluded from the scan) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=3BP1_RAT&output=json https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=3BP1_RAT&output=json&skip=1 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=3BP1_RAT&output=json&skip=0
lowscore	Run the scan at a high sensitivity (show weak matches for profiles)	If on (lowscore=1): shows matches with low level scores. Only relevant for PROSITE profiles. Default: lowscore=0 (off, PROSITE profiles are scanned with cut-off of level 0) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=CO2_BOVIN&output=tabular https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=CO2_BOVIN&output=tabular&lowscore=0 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=CO2_BOVIN&output=tabular&lowscore=1
noprofile	Exclude profiles from the scan	If on (noprofile=1): does not scan against profiles. Only works if 'seq' is defined and 'sig' is not defined, i.e. on scans of specific sequence(s) against all PROSITE motifs. Default: noprofile=0 (off, PROSITE profiles are included in the scan) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=ENTK_HUMAN&output=xml https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=ENTK_HUMAN&output=xml&noprofile=0 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?seq=ENTK_HUMAN&output=xml&noprofile=1
minhits	Mimimal number of hits per matched sequences	Mimimal number of hits per matched sequences. Only works if 'sig' and 'db' are defined, i.e. on scans of protein database(s) against specific motif(s). Default: minhits=1 (Scanned sequences with one match or more are reported in the results) Examples: https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=1 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=2 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=3 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=4 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=5 https://prosite.expasy.org/cgi-bin/prosite/scanprosite/PSScan.cgi?sig=PS50070&output=nice&minhits=10