
Table of Contents
1. Standard data input interface
2. Standard result report
interface
3. Standard result detail
interface
4. Standard result iterator
interface


Figure 1. The Standard data input interface.
The items listed in this drop-down box are amino acid modifications that are assumed to have been made to all of the amino acids of a particular type. The actual specification of the modifications is located in the file “sonar_cmods.js”. The format for the modifications is the sign of the modification (+ for addition and - for removal), the chemical formula for the modification and the residue to modify (single letter). Multiple modifications are separated by semi-colons.
For example, to
represent the modifications produced by treating a protein with iodoacetamide,
the following string is used:
+C2H3O1N1@C
Modifications are
applied sequentially, so the following string would produce the same
modification:
-H1@C;+C2H4O1N1@C
The symbols for the elements are the conventional symbols, with attention paid to the letter case, i.e., tin is represented by Sn, not SN. The numbers following the element symbols are required, i.e., water should be represented as H2O1, rather than H2O.
ALLOWED:
A character string representing modifications, in the following format:
(cf 1)@X;(cf 2)@X;...(cf n)@X;
where "cf" is the chemical formula of the modification, such as
+C2H3O1N1, and X is a simple letter abbreviation for the appropriate amino acid
residue.
EXAMPLE:
CMOD=+C2H3O1N1@C
HISTORY:
Introduced in v. 1.0
Partial modifications are chemical modifications to amino acids that
may be either caused by post-translational modifications
(<I>e.g.</I>, phosphorylation) or unintended chemical reactions
(<I>e.g.</I>, methionine oxidation). These modifications may be
present in a particular peptide species, so each residue of the appropriate
type must be checked individually for the presence (or absence) of the
modification.
The format and rules for specifying these modifications is the same as
for the specification of complete modifications (see “Modify”). The definition
of the available modifications is in the file “sonar_pmods.js”.
It is usually difficult to find this type of modification using
conventional protein identification techniques. A much better approach is to
identify the protein and then examine the protein in detail following the
identification.
CGI variable: PMOD
ALLOWED:
A character string representing modifications, in the following format:
(cf 1)@X;(cf 2)@X;...(cf n)@X;
where "cf" is the chemical formula of the modification, such as +O1,
and X is a simple letter abbreviation for the appropriate amino acid residue.
EXAMPLE:
PMOD=+O@M
HISTORY:
Introduced in v. 1.0
Two types of mass assignment errors may be specified:
1.
(P) - parent
ion mass error (in Daltons); and
2.
(D) - daughter
ion mass error (in Daltons).
The best practice for specifying these errors is to make the parent ion
mass error as wide as practical, while making the daughter ion error as narrow
as practical (e.g., P = 2 Da and D = 0.4 Da for an ion trap run without
zoom scanning). If possible, D = 0.4 Da is a much better value than D = 0.5 Da:
the latter represents the assignment of a nominal mass, while the former is
much more restrictive. Unlike most
other search engines, Sonar ms/ms does not require the specification of the
parent ion mass to give a robust identification, although it will use this
information.
ALLOWED:
Any positive decimal value greater than 0.0. (default = 2.0)
EXAMPLE:
ERRM=0.5
DESCRIPTION:
This value is the daughter ion mass error, in daltons.
HISTORY:
Introduced in v. 1.0
ALLOWED:
Any positive decimal value greater than 0.0. (default = 2.0)
EXAMPLE:
ERRP=0.1
DESCRIPTION:
This value is the parent ion mass error, in daltons.
HISTORY:
Introduced in v. 1.0
A modified version
of the m/z signal-to-noise peak detection algorithm is used by Sonar ms/ms. For
Finnigan DTA files or Micromass PKL files, a value of 1.4-2.0 gives the best
performance.
ALLOWED:
Any decimal value greater than 1.0. (default = 1.4)
EXAMPLE:
SN=1.4
DESCRIPTION:
This value is the signal-to-noise ratio used to find peaks in the mass spectra
parsed before a search is performed.
HISTORY:
Introduced in v. 1.0
This value affects the way that peptides that have been identified are
displayed on the report. The two different cases are as follows:
1.
unchecked: all
peptides with expectation values greater than the expectation value limit will
be displayed. If a peptide is shown in italics, that means that the spectrum
that was used to justify the existence of that peptide has also been used to
assign another peptide which matched the spectrum better than the italicized
peptide.
2.
checked: all
peptides that would have appeared in italics (see the checked case) have been
deleted from the report. Any protein that does not have any peptide evidence
other than italicized peptides does not appear on the report.
ALLOWED:
"no" (default) or "yes"
EXAMPLE:
SHOW_BEST=yes
HISTORY:
Introduced in v. 1.2
This value affects the way that spectra are interpreted.
1.
unchecked: the
assigned charge of the parent ions is assumed to be correct; or
ALLOWED:
"no"(default); or
"yes"
EXAMPLE:
CHECK_Z=yes
HISTORY:
Introduced in v. 1.2
These selections are only used when NCBI's NR or dbEST databases are
being searched (see the entry for Databases and genomes below). Two selections
are available: they are additive with any overlap between selections handled so
that a particular taxon is only searched once.
The taxonomy selections available are based on three files:
1.
sonar_taxa.js;
2.
sonar_spec.js;
and
3.
msms_list.xml
(located in the database directory).
The first two files create the drop-down lists available from the user
interface, while the last file contains the appropriate information to
translate a particular value from the lists into a set of files to be searched.
As a part of the general Prowl sequence database collection, NR and dbEST are
divided into a collection of files that contain the sequences for broad (or
narrow) classifications of organisms. The file msms_list.xml defines which
files are included in a search when a particular taxonomic classification has
been chosen. For example, the entry for Viridiplantae is
<db_list database=nr|dbest
tax=Viridiplantae>
<file
name=Arabidopsis-thaliana/>
<file
name=Oryza-sativa/>
<file
name=Other-Viridiplantae/>
</db_list>
By editing this file, new classifications can be made and these
classifications can be entered into the interface JavaScript files. It should
be noted that this file is in XML format, so the presence of quote marks and
the case of letters is not optional, as it is in HTML. If you wish to edit this
file, try to stay as close to the format of the original file as possible (you
should keep a back up version of the original file).
ALLOWED:
Any taxonomy value allowed by "msms-list.xml" or
"none"(default).
EXAMPLE:
TAXA=Viridiplantae
DESCRIPTION:
This value is used in combination with "SPEC" and "DBSE" to
determine which files should be searched. If DBSE is "nr" or
"dbest", "TAXA" is compared with the values in
"msms-list.xml" and the appropriate files searched. Files associated
with "SPEC" are also searched.
HISTORY:
Introduced in v. 1.0
ALLOWED:
Any taxonomy value allowed by "msms-list.xml" or
"none"(default).
EXAMPLE:
SPEC=Homo-sapiens
DESCRIPTION:
This value is used in combination with "TAXA" and "DBSE" to
determine which files should be searched. If DBSE is "nr" or
"dbest", "TAXA" is compared with the values in
"msms-list.xml" and the appropriate files searched. Files associated
with "TAXA" are also searched.
HISTORY:
Introduced in v. 1.0
It is possible to search three different types of sequence databases
using this search engine. These types are as follows:
1.
Fully
translated peptide sequences (e.g., NCBI nr);
2.
cDNA RNA
sequence databases (e.g. NCBI dbEST); and
3.
genomic DNA
sequences databases (e.g. Sanger Center's P. falciparum).
The RNA and DNA sequences do not have to be
complete, continguous sequences: in fact it is assumed that these sequences are
fragmentary, with signficant numbers of errors. Fully curated and correct
nucleic acid sequences are usually translated into peptide sequences.
The description of which databases are available to be displayed in the
user interface is the “allds.js” file, which can be found in the /prowl
directory that contains the Sonar ms/ms installation. It is not necessary to edit
this file manually as there are database update scripts available. It may be
desirable to edit the text that is displayed on the screen for clarity,
however.
ALLOWED:
An character string representing a valid sequence database, as defined in the
"/databases/databases.dat" file. See the example for the format.
EXAMPLE:
"/databases/databases.dat" line:
1#NCBInr#nr#..\databases\#1#0##.fasta
DBSE=#1#NCBInr#nr#..\databases\#1#0
where "1" is the database id number, "NCBInr" is the database
name, "nr" is the filename, "..\databases" is the relative
directory containing the file, "1" indicates that the database has
taxonomic division files (0 means no divisions) and "0" means that
the database is peptide sequences (1 means nucleotide sequences).
HISTORY:
Introduced in v. 1.0
The number that is entered into this box represents the maximum
expectation value allowed for sequences in the displayed results.
Sonar ms/ms does not use raw search engine scores to rate matches
between sequences and spectra. Instead it uses expectation values calculated
from these scores. The simple interpretation of an expectation value is the
number of matches that would be expected to have a particular score, if the
matches were completely random. Therefore, the smaller the expectation value,
the more likely that a particular match is a true match, rather than a random
one.
For example, an expectation value of 1 means that at least 1 similar
match would be expected when search a database that did not include the protein
sequence that truly matches your MS/MS data. An expectation value of 0.0001
means that a similar match would be found approximately once in every 10000
similar sized databases that did not contain the sequence that truly matches
your MS/MS data.
There are two types of expectation values shown. The top line
expectation value (in bold) is calculated for the collection of peptides that
have been discovered for the full sequence. The second set of expectation
values are calculated for the individual peptides, without any reference to the
other spectra present in a collection (such as a complete LC MS/MS).
This system of
estimating the risk of a random match versus a true match is used in most
conventional sequence homology matching systems, such as BLAST. It has the
distinct advantage of being independent of the scoring system: the expectation
value is calculated from a distribution of scored sequences, rather than on a
particular results.
ALLOWED:
Any decimal value greater than 0.0. (default = 1.0)
EXAMPLE:
EXPT=1.0
HISTORY:
Introduced in v. 1.0
Four different types of instrument configurationation are currently
supported. Each ion source/analyzer pair has particular properties and
characteristics that can be used to enhance a particular identification.
1.
e-QTOF -
quadrupole-time-of-flight analyzer with an electrospray ion source;
2.
e-IT - ion trap
analyzer with an electrospray ion source;
3.
m-QTOF -
quadrupole-time-of-flight analyzer with a MALDI ion source; and
4.
m-IT- ion trap
analyzer with an electrospray ion source.
The instruments available using the standard interface are recorded in
the file “sonar_inst.js”.
ALLOWED:
e-IT (default);
e-QTOF;
m-IT; or
m_QTOF.
EXAMPLE:
INST=m-IT
HISTORY:
Introduced in v. 1.0
When searching peptide sequence databases, protein identifications are
placed into five (5) classifications that are determined by keyword searches of
the database entries:
1.
protein - this is a general classification for
everything that does not fit the other criteria;
2.
cytoskeleton -
any protein that is generally considered to be part of the cytoskeleton;
3.
ribosome -
ribosomal proteins;
4.
artifacts -
proteins commonly introduced by handling or preparation; and
5.
custom -
proteins that match the keywords entered into the Custom keywords box.
If no keywords are entered into the Custom keywords box, no custom
category will appear in the results page.
ALLOWED:
Any character string with key words separated by spaces.
(default="").
EXAMPLE:
EXCLUDE=myoglobin
cytochrome
HISTORY:
Introduced in v. 1.0
Some output file
formats for MS/MS spectra do not contain parent ion m/z ratio. For these file
formats, it is necessary to add this information manually, using this box.
ALLOWED:
any decimal value between 500.0 and 5000.0
EXAMPLE:
MZP=2524.345
DESCRIPTION:
The mass-to-charge ratio of a parent ion that corresponds to the values in the
variable "MZF".
HISTORY:
Introduced in v. 1.0
Some output file formats for MS/MS spectra do not contain parent ion charge. For these file formats, it is necessary to add this information manually, using this box.
ALLOWED:
An integer value > 0
EXAMPLE:
ZP=2
HISTORY:
Introduced in v. 1.0
The input file name is entered into this box, usually using the
“Browse” button to select the file using a conventional file browsing box. This
file is actually copied to the server that performs to search, so the selection
of very large files will make processing the file longer. The file formats
supported are as follows:
1.
DTA files
(Thermo and Micromass);
2.
RAW files
(Thermo, requires lcq_dta.exe to be installed); and
3.
PKL files
(Micromass).
ALLOWED:
A character string that contains the path to a data file.
EXAMPLE:
INPUT_FILE=c:\temp\test.dta
HISTORY:
Introduced in v. 1.0
Enter the m/z values for daughter ions into this box. Enter only one (1) m/z value per line. If you wish to enter intensity information as well as m/z, enter the m/z value, a comma and then the intensity. Do not leave any additional spaces or “tab” characters into the box.
CGI variable: MZF
ALLOWED:
A character string representing a list of mass-to-charge ratio and intensity
pairs. The values are separated by commas. Each pair is separated by a line
feed.
EXAMPLE:
MZF=1000.0,100
2125.3,50
3587.345,1
HISTORY:
Introduced in v. 1.0
The Sonar report page is a series of simple tables, that describe the
proteins (or ESTs) found and a summary of the search conditions that were used.
This page describes the various columns in these tables and supplies some
explanation of the meaning of the terms used.

Figure 2. The standard result report interface
The expectation value is a simple statistic that allows the comparison
of the reliability of results. Low expectation values ( 1) correspond to
confident identifications.
For example: an expectation value of 10 means that purely random
matching will produce about ten results as good as the one reported. Similarly,
an expectation value of 1.0 means that 1 random result is expected to be as
good as the one reported. An expectation value of 0.001 means that only 0.001
random results are expected to be as good as the one reported.
The result column contains the evidence supporting the expectation
value. The top line is a link to the NCBI's repository of sequence information,
if possible. In the same box, the entries under the Redundant heading are
database entries that result in identified peptides that are the same as those
assigned to the entry displayed. They may only have a subset of these peptides.
Peptides where the assignment is ambiguous, a choice of assignment is
made using the following rules:
1.
if a spectrum
can be assigned to more than one peptide, the peptide is assigned to the
peptide sequence that has the lowest expectation value;
2.
if two peptides
have the same expectation value for the same spectrum, the peptide contained in
the complete sequence with the best expectation value is chosen; and
3.
if two peptides
have the same expectation value for the same spectrum and the complete
sequences have the same expectation value, both peptides are shown as assigned.
The peptide assignments based on spectra that have been assigned to
other peptides are shown in italics in the display. If the Show best
assignments only box is checked, these peptides are not shown. These peptide
assignments are not shown in the Search iteration page either.
These numbers correspond to the number of a-, b- and y-ions (Biemann notation) that were assigned.
The value shown is zm/z(m-a) where:
1.
z is the
measured charge of the parent ion that has been associated with a peptide
sequence;
2.
m/z is the
measured mass-to-charge ratio of the parent ion that has been associated with a
peptide sequence; and
3.
(m-a) is the
mass difference between mass calculated from the measured parent ion m/z and z
values (m) and the assigned peptide mass (a).
Fragmentation diagrams are a very simple-to-read representation of
MS/MS data. The lines drawn above the sequence correspond to the relative
intensity of fragment ions assigned to particular peptide bond cleavages. The
lines are drawn between the two residues corresponding to the cleaved peptide bond.
The length of the line corresponds to the relative intensity of the most
intense ion assigned to a particular bond cleavage. Clicking on the
fragmentation diagram produces a detailed analysis of the peaks assigned and
the peaks that could not be assigned to the proposed sequence.
If a fragmentogram is shown in italics, the spectrum that gave evidence
for the existence of this peptide produced a better assignment to a peptide in
another protein.
The Sonar report is broken up into 1 to 5 sections. These sections are
protein, cytoskeleton, ribosome, and
artifacts (the Custom heading is optional). Results are placed into the
Protein category if they do not fit into the other ones. The keywords used for
each heading are listed below the title line of each section.
If the search was performed on a genomic or EST database, the
cytoskeleton, ribosome, and artifacts sections are not shown, because the
databases are not annotaed well enough to use this feature.
This table contains the search parameters used to perform a search. It
appears at the end of the Sonar report.
The Sonar detail report page is a series of simple tables, that
describe the peptides that have been assigned by Sonar. This page describes the
various columns in these tables and supplies some explanation of the meaning of
the terms used.

Figure 3. The Standard result detail interface
The peptide that was assigned to the parent ion's MS/MS spectrum, shown as a fragmentogram. Clicking on the fragmentogram will open a tools page that allows you to manipulate the sequence and calculate fragment masses.
Clicking on the GI
number (an NCBI identifier) takes you to a tools page for manipulating the full
sequence of the protein/EST. For EST's, clicking on the “frame
translation" button takes you to a tools page for manipulating the amino
acid translation of the EST reading frame that contains the assigned peptide.
In DTA or PKL files, the ordinal number corresponding to the MS/MS
spectrum that produced this match.
The parent ion m/z, charge (z), measured mass (m), assigned mass (a),
and the difference between the measured and assigned masses (m-a).
The relative intensity of a fragment ion (1 to 10 ° symbols), the
measured m/z (m), the assigned m/z (a), the difference between the measured and
assigned m/z, the assigned charge (z) and the assignment, in Biemann notation.
The assignment is followed by the two residue pair cleaved to produce the
fragment and a small molecule mass loss in parenthesis (if required).
The relative
intensity and m/z values of all fragment ions not assigned by Sonar.

Figure 4. The Standard result iterator interface.
This type of search involves using the results of previous searches to
attempt to mine more information from mass spectra. The user is presented with
the results of the previous search, including a detailed list of the parent
ions that were submitted for the search and the main search parameters that
were used to perform the previous search. The parent ions are listed in three
tables:
1.
Unassigned
parent ions - these are parent ions that could not be associated with a
sequence. All unassigned parent ions are checked for further searching by
default.
2.
Parent ions
similar to assigned ions - these are ions that were not associated with a
sequence. However, ions with parent ion masses within the parent ion error
value were associated with a sequence, making it possible that these ions are
either different charge states of the same peptide, or they are weaker MS/MS
spectra than the assigned ions. These ions are unchecked by default.
3.
Assigned parent
ions - these parent ions were assigned to a sequence and are unchecked by
default. If a parent ion was assigned to more that one sequence, the highest
scoring sequence is shown.
To perform an iterative search, simply reset the main search parameters
at the top of the page, select which of the parent ions you wish included in
the search and press the button. A search iteration can be iterated further:
the number of iterations are indicated at the top of each search page, along
with links to go back to an earlier iteration.
The search controls on the top of the page have the same meanings and definitions as the search controls in Section 1 (see above). The only difference is that any control that is drawn using a Javascript file in the Standard data entry interface is drawn here with the literal values from the Javascript file that was current when the original search was performed. Therefore, the search conditions available in an interator page directly reflect the conditions that were present when the original search was performed.