|
An average mass is obtained, when a peak in the mass
spectrum is not isotopically resolved (i.e. in the case of low-resolution
mass measurement), or when the A0 component of the isotopic distribution can
not be unambiguously identified in high-resolution measurement. When only
some of the peaks in the mass spectrum are isotopically resolved, data
consist of both average and monoisotopic masses. List average masses in the
window for "Average Masses" and monoisotopic masses in the window
for "Monoisotopic Masses".
Candidates proteins are ranked by the probability that
"the candidate protein is the sample protein". Similar proteins
will be assigned to the same rank, and they are not treated as separate
candidates.
Specify the type of peptide masses: M for neutral
peptides and MH+ for singly protonated peptides.
Specify the chemical formula for modification. Only plus
(+), minus (-), digits (2-9), and alphabets (A-Z,a-z) are recognized. The
format for the formula is:
[+|-]element[[#occurrence]..[element[#occurrence]]]
An
example is: -CH4S (remove one carbon, four hydrogen, and a sulfur)
Specify
associated modification for the specified cleavage. For instance, a
modification (-CH4S) is associated with cyanogen bromide (CNBr) cleavage.
Specify
the amino acid at which cleavage occurs as "user-defined" cleavage.
Multiple cleavage sites can be specified. To select multiple cleavage sites,
hold down Ctrl key while clicking on the intended amino acids.
Choose
complete modifications when modifications go complete at all susceptible
sites (amino acid).
Coverage
is defined as the ratio of the portion of protein sequence covered by matched
peptides to the whole length of protein sequence. Hyperlinked output pages
show graphical and text representations of the matched peptides from the
protein candidates. For each candidate protein, graphs are provided to allow
the user to quickly assess the experimental peptide mass coverage of the
protein and the mass measurement errors.
Specify
the sequence database to be searched. The actual available databases are
package dependent. They may include:
· NCBI's nr
· NCBI's
dbEST
· Swiss
PROT
· OWL
· PIR
· NRL3D
· User defined sequence database
Specify
the cleavage either by choosing the enzyme or chemical from the
"pre-defined" list or by setting up a "user-defined"
cleavage (See User Defined
Cleavage). To set up and use "user-defined" cleavage, you must
explicitly press the radio button in front of the text of
"User-defined".
Specify
the amino acid(s) next to which a cleavage is excluded. To select multiple
cleavage sites, hold down Ctrl key while clicking on the intended amino
acids.
Extra
setting allows user to specify the number of particular amino acids in given
peptides, if known. For example, if the total number of Y or W is known to be
3 for a mass, specify "2+" in the Y/W column for the mass. When
such information is not available, specify "n/a".
The
knowledge of the presence (or absence) and the number of particular amino
acids contained within a given peptide provides constraints in database
searching to reduce the occurrence of database peptides that randomly match
the experimental mass spectral data, thereby improving the confidence level
for identifications. Experimentally, such information can be obtained in a
number of ways. For example, number of methionine residues in a peptide can
be inferred by observation of pairs of peaks separated by the mass of oxygen
(because methionine residues contained in proteolytic peptides are frequently
found to be partially oxidized).
The number that is entered into this box
represents the maximum expectation value allowed for sequences in the
displayed results.
Profound does not use raw search engine scores to
rate matches between sequences and spectra. Instead it uses expectation
values calculated from these scores. The simple interpretation of an
expectation value is the number of matches that would be expected to have a
particular score, if the matches were completely random. Therefore, the
smaller the expectation value, the more likely that a particular match is a
true match, rather than a random one.
For example, an expectation value of 1 means that
at least 1 similar match would be expected when search a database that did
not include the protein sequence that truly matches your MS data. An expectation
value of 0.0001 means that a similar match would be found approximately once
in every 10000 similar sized databases that did not contain the sequence that
truly matches your MS data.
This system of estimating the risk of a random
match versus a true match is used in most conventional sequence homology
matching systems, such as BLAST. It has the distinct advantage of being
independent of the scoring system: the expectation value is calculated from a
distribution of scored sequences, rather than on a particular result.
This
section specifies "general" information about a search. Please see
specific parameters for details.
The
program treats proteins with similar sequences as a single candidate in the
probability calculation.
When a
group of proteins are found to be similar, they are grouped together under a
single rank, marked with a "+" sign. Currently, Internet Explorer
users can expand or contract the display of the protein group by clicking on
the "+" sign.
Our
company logo and name can be removed when there is a strict restriction on
displaying company logo at academic conference presentations. Please contact
ProteoMetrics to obtain a patch for logo removal.
Specify
the error of peptide mass measurement. The errors for average and
monoisotopic masses are specified independently.
Specify
the type of mass tolerance: Dalton (Da) for absolute mass error, and
percentage (%) or parts per million (ppm) for relative error.
This
is the place to enter measured proteolytic peptide masses. There are three
alternative methods for specifying the masses of peptides used to search the
database. These are average mass, monoisotopic mass, and a combination of
average and monoisotopic masses (this latter alternative is useful when only
some of the peaks in the mass spectrum are isotopically resolved.) For a
given peptide, either the average or monoisotopic mass is entered. Only numbers
and decimal points are recognized. Other characters will be discarded.
Specify
the maximum number of missed cleavage sites within the peptide (yielding
incomplete cleavage peptides). Unless you have a good reason, do not use
large number for this parameter. A large number may degrade the quality of
the search result.
Specify
the amino acid at which a modification occurs. Modifications with the same chemical
formula can be specified for multiple amino acids. To select multiple amino
acids, hold down Ctrl key while clicking on the intended amino acids. The
chemical formula of the associated modifications is to be entered explicitly
(see Chemical
Formula for Modifications). Two "user-defined" modifications
(Modification 1 and Modification 2) with different chemical formulas can be
specified.
Specify
modification(s) by either choosing from the "pre-defined" list or
setting up "user-defined" modifications. The modifications can be
set either as complete or as partial. Setting up partial modifications should
be done with caution and is highly discouraged. In most cases, choosing
partial modifications only increase random matches, and thus degrades the
quality of the search result.
A
monoisotopic mass is obtained in high-resolution mass measurement, when the
A0 component of the isotopic distribution can be unambiguously identified.
When only some of the peaks in the mass spectrum are isotopically resolved,
data consist of both average and monoisotopic masses. List average masses in
the window for "Average Masses" and monoisotopic masses in the window
for "Monoisotopic Masses".
Partial
modifications (or incomplete modifications) are modifications that do not
occur all the time at susceptible sites (amino acids). Setting up partial
modifications should be done with caution and is highly discouraged. In most
cases, choosing partial modifications only increase random matches, and thus
degrades the quality of the search result.
The
following pre-defined cleavages are supported in the current version of
ProFound:
Name Cleave Don't Cleave Cleave At Modification Next To Endoproteinase Arg C R P C-terminal Endoproteinase Asp N D N-terminal Endoproteinase Glu C E P C-terminal Endoproteinase Lys C K P C-terminal CNBr M C-terminal +CH3S Trypsin KR P C-terminal V8 (D,E) DE P C-terminal V8 (E) E P C-terminal
The
following pre-defined modifications are supported in the current version of
ProFound:
Complete Modifications: 4-vinyl-pyridine (C) Acrylamide (C) Iodoacetamide (C) Iodoacetic acid (C) Performic acid (C+O3) Performic acid (M+O2) Performic acid (M+O) Partial Modifications: 4-vinyl-pyridine (C) Acrylamide (C) Iodoacetamide (C) Iodoacetic acid (C) Nitration (Y) Oxidation (M) Performic acid (C+O3) Performic acid (M+O2) Performic acid (M+O) Phosphorylation (S,T,Y) Phosphorylation (S,T) Phosphorylation (Y)
ProFound
computes the normalized probability that a protein in a database is the protein
being analyzed based on data, experimental conditions and other background
information.
Click
on symbol T to use sequence analysis tools contained in PROWL to
further analyse the candidate protein sequence. Click on protein name to
retrieve protein information from NCBI web site.
Specify
the estimated protein mass range of the sample protein. A narrower range
provides a constraint to a search. When the molecular weight information is
correct, it often improves the performance of the search.
Specify the estimated protein pI range of the sample
protein. A narrower PI range provides a constraint to a search. When the PI
information is correct, it is helpful to a search.
This
is the place for you to put a note for the search.
Click
on symbol ® to further identify proteins using the same set of search
parameters with unmatched masses. This "subtraction method" is
another way to search for multiple protein components in a sample (See also ProFound's mixture search).
Frequently,
a protein sample contains a mixture of proteins. ProFound provides a variety
of methods for the identification of the protein components in a mixture.
These methods can be used in combination in searches.
Method
A: ProFound can be explicitly specified to identify single
protein, or multiple protein components in a single search. The current
version of ProFound allows simultaneous identification of up to four protein
components in a mixture. Method B: After a search, ProFound can search
for the additional protein component(s) using unmatched peptide masses.
A
representation of a phylogenic tree is provided through which the user can
specify the origin of the sample protein, if known. The taxonomic categories
are based on NCBI's tree. Use of the correct taxonomic information
can increase search speed and make the search result more reliable. The
following are the taxonomic categories implemented.
All taxa Archaea (Archaebacteria) Bacteria (Eubacteria) Firmicutes (gram-positive bacteria) Bacillus subtilis Mycoplasma Other Firmicutes (gram-positive bacteria) Proteobacteria (purple non-sulfur bacteria) Enterobacteria Escherichia coli Other Enterobacteria Other Proteobacteria (purple non-sulfur bacteria) Other Bacteria Eukaryota (eucaryotes) Dictyostelium discoideum Fungi Pneumocystis carinii Saccharomyces cerevisiae (baker's yeast) Schizosaccharomyces pombe (fission yeast) Other Fungi Metazoa (animal) Caenorhabditis elegans Chordata (chordates) Fugu rubripes (pufferfish) Danio rerio (zebrafish) Mammalia (mammals) Primates Homo sapiens (human) Other Primates Rodentia (rodents) Mus musculus (house mouse) Rattus Other Rodentia (rodents) Other Mammalia (mammals) Xenopus laevis (African clawed frog) Other Chordata (chordates) Drosophila (fruit flies) Other Metazoa (animal) Plasmodium falciparum (malaria parasite P. falciparum) Viridiplantae (green plants) Arabidopsis thaliana (thale cress) Oryza sativa (rice) Other Viridiplantae (green plants) Other Eukaryotes (eucaryotes) Viroids Viruses Hepatitis C virus Other Viruses Others Unclassified
Specify
the number of top candidate proteins in the output display.
When
the undergone cleavage applied is not on the list of "pre-defined"
cleavages, , you can set up a user-defined cleavage. To do so, you must
explicitly press the radio button in front of the text of
"User-defined". Then, you should choose cleavage site(s) (amino
acid) and the position (N or C terminal). You can also set site(s) next to
which a cleavage should NOT occur. And, finally, you can set modification
that is associated with this cleavage by typing in the chemical formula for
the modification (see Chemical
Formula for Modifications).
Up to
two "user-defined" modifications with different chemical formulas
can be set as a supplement to the "pre-defined" modifications. A
"user-defined" modification includes modification site(s), a
modification formula, and a flag to indicate whether it is a complete or
partial modification. There is a limit on the number of partial
modifications: four (4) - sum of "pre-defined" and
"user-defined" partial modifications.
ProFound
calculates the probability that a candidate in a database search is the
protein being analyzed. However, it is not easy to cast the calculated
probability into the common language of traditional statistics. Here, as an
indicator of the quality of the search result, a Z score is estimated when
the search result is compared against an estimated random match population. Z
score is the distance to the population mean in unit of standard deviation.
It also corresponds to the percentile of the search in the random match
population. For instance, a Z score of 1.65 for a search means that the
search is in the 95th percentile. In other words, there are about 5% of
random matches that could yield higher Z scores than this search.
Conceptually, this “95th percentile” is different from “95% confidence” that
the search is a correct identification.
The
following is a list for Z score and its corresponding percentile in an
estimated random match population:
Z percentile 1.282 90.0 1.645 95.0 2.326 99.0 3.090 99.9
. |