A description of Sonar ms/ms search control parameters
|
|
Modify
|
The items listed in this drop-down box are amino acid modifications that are assumed to have been made to all of the amino acids of a particular type. The actual specification of the modifications is located in the file "sonar_cmods.js". The format for the modifications is the sign of the modification (+ for addition and - for removal), the chemical formula for the modification and the residue to modify (single letter). Multiple modifications are separated by semi-colons.
For example, to represent the modifications produced by treating a protein with iodoacetamide, the following string is used:
+C2H3O1N1@C.
Modifications are applied sequentially, so the following string would produce the same modification:
-H1@C;+C2H4O1N1@C. The symbols for the elements are the conventional symbols, with attention paid to the letter case, i.e., tin is represented by Sn, not SN. The numbers following the element symbols are required, i.e., water should be represented as H2O1, rather than H2O.
|
|
Partial mods
|
Partial modifications are chemical modifications to amino acids that may be either caused by post-translational modifications (e.g., phosphorylation) or unintended chemical reactions (e.g., methionine oxidation). These modifications may be present in a particular peptide species, so each residue of the appropriate type must be checked individually for the presence (or absence) of the modification.
The format and rules for specifying these modifications is the same as for the specification of complete modifications (see above). The definition of the available modifications is in the file "sonar_pmods.js".
It is usually difficult to find this type of modification using conventional protein identification techniques. A much better approach is to identify the protein and then examine the protein in detail following the identification.
|
|
ICAT
|
This selection is useful for mass-encoded modification experiments for quantitation. The modification reagents are assumed to come in two forms: light and heavy. The modification is encoded in a string that looks like the following:
[442.22|8.0@C]
where the first number is the mass of the light modification, the second number is the mass difference between the heavy and light modifications and the letter is the residue to modify. The values for these strings is set in the file "sonar_icat.js".
Note: only peptides containing the modified residue will be shown in the results.
|
|
Filter spectra
|
When more than one database has been selected, each database will be searched sequentially.
If this box is checked, spectra that were identified in a particular database will be removed
from consideration when searching all subsequent databases. For example, if there are 2 MS/MS
spectra in a datafile and 2 databases are selected, then after the first database is searched,
the results are checked. If 1 of the spectra leads to a signficant identification, then only the
other spectrum will be compared with the second database. A spectrum is considered identified if
the peptide scores better than the expectation value on the same line as the checkbox, and the
associated protein scores better than the "Expect" value on the line above.
|
|
Enzyme
|
Protein identification experiments usually begin with an enzymatic cleavage reaction. This selection allows the selection of any of the most commonly used enzymes. The enzyme names are defined in the file "sonar_enzymes.js".
Custom cleavages can be specified using the following format for enzymes that cleave on the C-terminal side of a particular residue (e.g., trypsin):
[KR]{P}
where the residues to cleave at are in square brackets and the residues that cannot follow the residues to cleave are in curly brackets. Up to five residues are allowed in each category. For enzymes that cleave on the N-terminal side of a residue the definition would be written as follows (e.g., endo Asp N):
{}[D]
Placing the curly brackets in the first position indicates the cleavage is on the N-terminal side for the residues in square brackets. The curly brackets can be empty, but the lack of any curly brackets is taken to mean cleavage on the C-terminal side by default.
|
|
Errors
|
Two types of mass assignment errors may be specified:
- (P) - parent ion mass error (in Daltons); and
- (D) - daughter ion mass error (in Daltons).
The best practice for specifying these errors is to make the parent ion mass error as wide as practical, while making the daughter ion error as narrow as practical (e.g., P = 2 Da and D = 0.4 Da for an ion trap run without zoom scannning). If possible, D = 0.4 Da is a much better value than D = 0.5 Da: the latter represents the assignment of a nominal mass, while the former is much more restrictive.
Unlike most other search engines, Sonar ms/ms does not require the specification of the parent ion mass to give a robust identification, although it will use this information if it is available.
|
|
Signal-to-noise
|
A modified version of the m/z signal-to-noise peak detection algorithm is used by Sonar ms/ms. For Finnigan DTA files or Micromass PKL files, a value of 1.4-2.0 gives the best performance.
|
|
Show best
|
This value affects the way that peptides that have been identified are displayed on the report. The two different cases are as follows:
- unchecked: all peptides with expectation values greater than the expectation value limit will be displayed. If a peptide is shown in italics, that means that the spectrum that was used to justify the existence of that peptide has also been used to assign another peptide which matched the spectrum better than the italicized peptide.
- checked: all peptides that would have appeared in italics (see the checked case) have been deleted from the report. Any protein that does not have any peptide evidence other than italicized peptides does not appear on the report.
|
|
Check z
|
This value affects the way that spectra are interpreted.
- unchecked: the assigned charge of the parent ions is assumed to be correct; or
- checked: the charge assigned to the parent ions is assumed to be uncertain: all spectra are checked for parent ion charges of 1, 2 and 3.
|
|
Taxonomy
|
These selections are only used when NCBI's NR or dbEST databases are being searched (see the entry for "Databases and genomes" below). Two selections are available: they are additive with any overlap between selections handled so that a particular taxon is only searched once.
The taxonomy selections available are based on three files:
- sonar_taxa.js;
- sonar_spec.js; and
- msms_list.xml (located in the database directory).
The first two files create the drop-down lists available from the user interface, while the last file contains the appropriate information to translate a particular value from the lists into a set of files to be searched. As a part of the general Prowl sequence database collection, NR and dbEST are divided into a collection of files that contain the sequences for broad (or narrow) classifications of organisms. The file "msms_list.xml" defines which files are included in a search when a particular taxonomic classification has been chosen. For example, the entry for "Viridiplantae" is
<db_list database="nr|dbest" tax="Viridiplantae">
<file name="Arabidopsis-thaliana"/>
<file name="Oryza-sativa"/>
<file name="Other-Viridiplantae"/>
</db_list>
By editing this file, new classifications can be made and these classifications can be entered into the interface JavaScript files. It should be noted that this file is in XML format, so the presence of quote marks and the case of letters is not optional, as it is in HTML. If you wish to edit this file, try to stay as close to the format of the original file as possible (you should keep a back up version of the original file).
|
|
Databases
|
It is possible to search three different types of sequence databases using this search engine. These types are as follows:
- Fully translated peptide sequences (e.g., NCBI nr);
- cDNA RNA sequence databases (e.g. NCBI dbEST); and
- genomic DNA sequences databases (e.g. Sanger Center's P. falciparum).
The RNA and DNA sequences do not have to be complete, continguous sequences: in fact it is assumed that these sequences are fragmentary, with signficant numbers of errors. Fully curated and correct nucleic acid sequences are usually translated into peptide sequences.
The description of which databases are available to be displayed in the user interface is the "allds.js" file, which can be found in the "/prowl" directory that contains the Sonar ms/ms installation. It is not necessary to edit this file manually as there are database update scripts available. It may be desirable to edit the text that is displayed on the screen for clarity, however.
|
|
Expect
|
The number that is entered into this box represents the maximum expectation value allowed for sequences in the displayed results.
Sonar ms/ms does not use raw search engine scores to rate matches between sequences and spectra. Instead it uses expectation values calculated from these scores. The simple interpretation of an expectation value is the number of matches that would be expected to have a particular score, if the matches were completely random. Therefore, the smaller the expectation value, the more likely that a particular match is a true match, rather than a random one.
For example, an expectation value of 1 means that at least 1 similar match would be expected when search a database that did not include the protein sequence that truly matches your MS/MS data. An expectation value of 0.0001 means that a similar match would be found approximately once in every 10000 similar sized databases that did not contain the sequence that truly matches your MS/MS data.
There are two types of expectation values shown. The top line expectation value (in bold) is calculated for the collection of peptides that have been discovered for the full sequence. The second set of expectation values are calculated for the individual peptides, without any reference to the other spectra present in a collection (such as a complete LC MS/MS).
This system of estimating the risk of a random match versus a true match is used in most conventional sequence homology matching systems, such as BLAST. It has the distinct advantage of being independent of the scoring system: the expectation value is calculated from a distribution of scored sequences, rather than on a particular results. See the presentation of how Sonar ms/ms works to learn more about how these values are calculated and used.
|
|
Device
|
Four different types of instrument configurationation are currently supported. Each ion source/analyzer pair has particular properties and characteristics that can be used to enhance a particular identification.
- e-QTOF - quadrupole-time-of-flight analyzer with an electrospray ion source;
- e-IT - ion trap analyzer with an electrospray ion source;
- m-QTOF - quadrupole-time-of-flight analyzer with a MALDI ion source; and
- m-IT- ion trap analyzer with an electrospray ion source.
The instruments available using the standard interface are recorded in the file "sonar_inst.js".
|
|
Custom keywords
|
When searching peptide sequence databases, protein identifications are placed into five (5) classifications that are determined by keyword searches of the database entries:
- Protein - this is a general classification for everything that does not fit the other criteria;
- cytoskeleton - any protein that is generally considered to be part of the cytoskeleton;
- ribosome - ribosomal proteins;
- artifacts - proteins commonly introduced by handling or preparation; and
- custom - proteins that match the keywords entered into the "Custom keywords" box.
If no keywords are entered into the "Custom keywords" box, no custom category will appear in the results page.
|
|
Parent m/z
|
Some output file formats for MS/MS spectra do not contain parent ion m/z ratio. For these file formats, it is necessary to add this information manually, using this box. |
|
Parent z
|
Some output file formats for MS/MS spectra do not contain parent ion charge. For these file formats, it is necessary to add this information manually, using this box. |
Input file (File only)
|
The input file name is entered into this box, usually using the "Browse" button to select the file using a conventional file browsing box. This file is actually copied to the server that performs to search, so the selection of very large files will make processing the file longer. The file formats supported are as follows:
- DTA files (Thermo and Micromass);
- RAW files (Thermo, requires lcq_dta.exe to be installed); and
- PKL files (Micromass).
|
Daughter m/z's (Manual only)
|
Enter the m/z values for daughter ions into this box. Enter only one (1) m/z value per line. If you wish to enter intensity information as well as m/z, enter the m/z value, a comma and then the intensity. Do not leave any additional spaces or "tab" characters into the box. |