User Interface Help Information For Sonar ms/ms

 

Table of Contents

 

1. Standard data input interface. 3

1.1 Modify. 4

1.2 Partial mods. 5

1.3 Errors. 6

1.4 Signal-to-noise. 7

1.5 Show best 8

1.6 Check z. 9

1.7 Taxonomy. 10

1.8 Database. 12

1.9 Expect 13

1.10 Device. 14

1.11 Custom keywords. 15

1.12 Parent m/z. 16

1.13 Parent z. 17

1.14 File. 18

1.15 Daughter m/z’s. 19

2. Standard result report interface. 20

2.1 Expect 21

2.2 Result 22

2.3 a:b:y. 23

2.4 m/z. 24

2.5 Fragmentogram.. 25

2.6 Sections. 26

2.7 Conditions. 27

3. Standard result detail interface. 28

3.1 Assigned sequence. 29

3.2 Full sequence. 30

3.3 Spectrum number 31

3.4 Parent properties. 32

3.5 Ion assignments. 33

3.6 Unassigned Ions. 34

4. Standard result iterator interface. 35

 

 



1. Standard data input interface

 

 


 


 

 


Figure 1. The Standard data input interface.


1.1 Modify

 

Description

 

The items listed in this drop-down box are amino acid modifications that are assumed to have been made to all of the amino acids of a particular type. The actual specification of the modifications is located in the file “sonar_cmods.js”. The format for the modifications is the sign of the modification (+ for addition and - for removal), the chemical formula for the modification and the residue to modify (single letter). Multiple modifications are separated by semi-colons.

For example, to represent the modifications produced by treating a protein with iodoacetamide, the following string is used:

 

+C2H3O1N1@C

 

Modifications are applied sequentially, so the following string would produce the same modification:

 

-H1@C;+C2H4O1N1@C

 

The symbols for the elements are the conventional symbols, with attention paid to the letter case, i.e., tin is represented by Sn, not SN. The numbers following the element symbols are required, i.e., water should be represented as H2O1, rather than H2O.

 

CGI variable: CMOD

 

ALLOWED:

A character string representing modifications, in the following format:

(cf 1)@X;(cf 2)@X;...(cf n)@X;

where "cf" is the chemical formula of the modification, such as +C2H3O1N1, and X is a simple letter abbreviation for the appropriate amino acid residue.

EXAMPLE:

CMOD=+C2H3O1N1@C

 HISTORY:

Introduced in v. 1.0

 

 


1.2 Partial mods

 

Description

 

Partial modifications are chemical modifications to amino acids that may be either caused by post-translational modifications (<I>e.g.</I>, phosphorylation) or unintended chemical reactions (<I>e.g.</I>, methionine oxidation). These modifications may be present in a particular peptide species, so each residue of the appropriate type must be checked individually for the presence (or absence) of the modification.

 

The format and rules for specifying these modifications is the same as for the specification of complete modifications (see “Modify”). The definition of the available modifications is in the file “sonar_pmods.js”.

 

It is usually difficult to find this type of modification using conventional protein identification techniques. A much better approach is to identify the protein and then examine the protein in detail following the identification.

 

CGI variable: PMOD

 

ALLOWED:

A character string representing modifications, in the following format:

(cf 1)@X;(cf 2)@X;...(cf n)@X;

where "cf" is the chemical formula of the modification, such as +O1, and X is a simple letter abbreviation for the appropriate amino acid residue.

EXAMPLE:

PMOD=+O@M

 HISTORY:

Introduced in v. 1.0

 


1.3 Errors

 

Description

 

Two types of mass assignment errors may be specified:

1.  (P) - parent ion mass error (in Daltons); and

2.  (D) - daughter ion mass error (in Daltons).

The best practice for specifying these errors is to make the parent ion mass error as wide as practical, while making the daughter ion error as narrow as practical (e.g., P = 2 Da and D = 0.4 Da for an ion trap run without zoom scanning). If possible, D = 0.4 Da is a much better value than D = 0.5 Da: the latter represents the assignment of a nominal mass, while the former is much more restrictive.  Unlike most other search engines, Sonar ms/ms does not require the specification of the parent ion mass to give a robust identification, although it will use this information.

 

CGI variable: ERRM

 

ALLOWED:

Any positive decimal value greater than 0.0. (default = 2.0)

EXAMPLE:

ERRM=0.5

DESCRIPTION:

This value is the daughter ion mass error, in daltons.

 HISTORY:

Introduced in v. 1.0

 

CGI variable: ERRP

 

ALLOWED:

Any positive decimal value greater than 0.0. (default = 2.0)

EXAMPLE:

ERRP=0.1

DESCRIPTION:

This value is the parent ion mass error, in daltons.

HISTORY:

Introduced in v. 1.0

1.4 Signal-to-noise

 

Description

 

A modified version of the m/z signal-to-noise peak detection algorithm is used by Sonar ms/ms. For Finnigan DTA files or Micromass PKL files, a value of 1.4-2.0 gives the best performance.

 

CGI variable: SN

 

ALLOWED:

Any decimal value greater than 1.0. (default = 1.4)

EXAMPLE:

SN=1.4

DESCRIPTION:

This value is the signal-to-noise ratio used to find peaks in the mass spectra parsed before a search is performed.

HISTORY:

Introduced in v. 1.0

 


1.5 Show best

 

Description

 

This value affects the way that peptides that have been identified are displayed on the report. The two different cases are as follows:

 

1.  unchecked: all peptides with expectation values greater than the expectation value limit will be displayed. If a peptide is shown in italics, that means that the spectrum that was used to justify the existence of that peptide has also been used to assign another peptide which matched the spectrum better than the italicized peptide.

2.  checked: all peptides that would have appeared in italics (see the checked case) have been deleted from the report. Any protein that does not have any peptide evidence other than italicized peptides does not appear on the report.

 

CGI variable: SHOW_BEST

 

ALLOWED:

"no" (default) or "yes"

EXAMPLE:

SHOW_BEST=yes

 HISTORY:

Introduced in v. 1.2

 


1.6 Check z

 

Description

 

This value affects the way that spectra are interpreted.

1.  unchecked: the assigned charge of the parent ions is assumed to be correct; or

  1. checked: the charge assigned to the parent ions is assumed to be uncertain: all spectra are checked for parent ion charges of 1, 2 and 3.

 

CGI variable: CHECK_Z

 

ALLOWED:

"no"(default); or

"yes"


EXAMPLE:

CHECK_Z=yes

HISTORY:

Introduced in v. 1.2

 

 


1.7 Taxonomy

 

Description

 

These selections are only used when NCBI's NR or dbEST databases are being searched (see the entry for Databases and genomes below). Two selections are available: they are additive with any overlap between selections handled so that a particular taxon is only searched once.

The taxonomy selections available are based on three files:

1.  sonar_taxa.js;

2.  sonar_spec.js; and

3.  msms_list.xml (located in the database directory).

The first two files create the drop-down lists available from the user interface, while the last file contains the appropriate information to translate a particular value from the lists into a set of files to be searched. As a part of the general Prowl sequence database collection, NR and dbEST are divided into a collection of files that contain the sequences for broad (or narrow) classifications of organisms. The file msms_list.xml defines which files are included in a search when a particular taxonomic classification has been chosen. For example, the entry for Viridiplantae is

 

<db_list database=nr|dbest tax=Viridiplantae>

   <file name=Arabidopsis-thaliana/>

   <file name=Oryza-sativa/>

   <file name=Other-Viridiplantae/>

</db_list>

 

By editing this file, new classifications can be made and these classifications can be entered into the interface JavaScript files. It should be noted that this file is in XML format, so the presence of quote marks and the case of letters is not optional, as it is in HTML. If you wish to edit this file, try to stay as close to the format of the original file as possible (you should keep a back up version of the original file).

 

CGI variable: TAXA

 

ALLOWED:

Any taxonomy value allowed by "msms-list.xml" or "none"(default).

EXAMPLE:

TAXA=Viridiplantae

DESCRIPTION:

This value is used in combination with "SPEC" and "DBSE" to determine which files should be searched. If DBSE is "nr" or "dbest", "TAXA" is compared with the values in "msms-list.xml" and the appropriate files searched. Files associated with "SPEC" are also searched.

HISTORY:

Introduced in v. 1.0

 

CGI variable: SPEC

 

ALLOWED:

Any taxonomy value allowed by "msms-list.xml" or "none"(default).

EXAMPLE:

SPEC=Homo-sapiens

DESCRIPTION:

This value is used in combination with "TAXA" and "DBSE" to determine which files should be searched. If DBSE is "nr" or "dbest", "TAXA" is compared with the values in "msms-list.xml" and the appropriate files searched. Files associated with "TAXA" are also searched.

HISTORY:

Introduced in v. 1.0

 

 


1.8 Database

 

Description

 

It is possible to search three different types of sequence databases using this search engine. These types are as follows:

1.  Fully translated peptide sequences (e.g., NCBI nr);

2.  cDNA RNA sequence databases (e.g. NCBI dbEST); and

3.  genomic DNA sequences databases (e.g. Sanger Center's P. falciparum).

The RNA and DNA sequences do not have to be complete, continguous sequences: in fact it is assumed that these sequences are fragmentary, with signficant numbers of errors. Fully curated and correct nucleic acid sequences are usually translated into peptide sequences.

The description of which databases are available to be displayed in the user interface is the “allds.js” file, which can be found in the /prowl directory that contains the Sonar ms/ms installation. It is not necessary to edit this file manually as there are database update scripts available. It may be desirable to edit the text that is displayed on the screen for clarity, however.

 

CGI variable: DBSE

 

ALLOWED:

An character string representing a valid sequence database, as defined in the "/databases/databases.dat" file. See the example for the format.

EXAMPLE:

"/databases/databases.dat" line:
1#NCBInr#nr#..\databases\#1#0##.fasta
DBSE=#1#NCBInr#nr#..\databases\#1#0
where "1" is the database id number, "NCBInr" is the database name, "nr" is the filename, "..\databases" is the relative directory containing the file, "1" indicates that the database has taxonomic division files (0 means no divisions) and "0" means that the database is peptide sequences (1 means nucleotide sequences).

HISTORY:

Introduced in v. 1.0

 


1.9 Expect

 

Description

 

The number that is entered into this box represents the maximum expectation value allowed for sequences in the displayed results.

Sonar ms/ms does not use raw search engine scores to rate matches between sequences and spectra. Instead it uses expectation values calculated from these scores. The simple interpretation of an expectation value is the number of matches that would be expected to have a particular score, if the matches were completely random. Therefore, the smaller the expectation value, the more likely that a particular match is a true match, rather than a random one.

For example, an expectation value of 1 means that at least 1 similar match would be expected when search a database that did not include the protein sequence that truly matches your MS/MS data. An expectation value of 0.0001 means that a similar match would be found approximately once in every 10000 similar sized databases that did not contain the sequence that truly matches your MS/MS data.

There are two types of expectation values shown. The top line expectation value (in bold) is calculated for the collection of peptides that have been discovered for the full sequence. The second set of expectation values are calculated for the individual peptides, without any reference to the other spectra present in a collection (such as a complete LC MS/MS).

This system of estimating the risk of a random match versus a true match is used in most conventional sequence homology matching systems, such as BLAST. It has the distinct advantage of being independent of the scoring system: the expectation value is calculated from a distribution of scored sequences, rather than on a particular results.

 

CGI variable: EXPT

 

ALLOWED:

Any decimal value greater than 0.0. (default = 1.0)

EXAMPLE:

EXPT=1.0

HISTORY:

Introduced in v. 1.0

 


1.10 Device

 

Description

 

Four different types of instrument configurationation are currently supported. Each ion source/analyzer pair has particular properties and characteristics that can be used to enhance a particular identification.

1.  e-QTOF - quadrupole-time-of-flight analyzer with an electrospray ion source;

2.  e-IT - ion trap analyzer with an electrospray ion source;

3.  m-QTOF - quadrupole-time-of-flight analyzer with a MALDI ion source; and

4.  m-IT- ion trap analyzer with an electrospray ion source.

The instruments available using the standard interface are recorded in the file “sonar_inst.js”.

 

CGI variable: INST

 

ALLOWED:

e-IT (default);

e-QTOF;

m-IT; or

m_QTOF.

EXAMPLE:

INST=m-IT

HISTORY:

Introduced in v. 1.0


1.11 Custom keywords

 

Description

 

When searching peptide sequence databases, protein identifications are placed into five (5) classifications that are determined by keyword searches of the database entries:

1.  protein  - this is a general classification for everything that does not fit the other criteria;

2.  cytoskeleton - any protein that is generally considered to be part of the cytoskeleton;

3.  ribosome - ribosomal proteins;

4.  artifacts - proteins commonly introduced by handling or preparation; and

5.  custom - proteins that match the keywords entered into the Custom keywords box.

If no keywords are entered into the Custom keywords box, no custom category will appear in the results page.

 

CGI variable: EXCLUDE

 

ALLOWED:

Any character string with key words separated by spaces. (default="").

EXAMPLE:

EXCLUDE=myoglobin cytochrome

 
HISTORY:

Introduced in v. 1.0

 


1.12 Parent m/z

 

Description

 

Some output file formats for MS/MS spectra do not contain parent ion m/z ratio. For these file formats, it is necessary to add this information manually, using this box.

 

CGI variable: MZP

 

ALLOWED:

any decimal value between 500.0 and 5000.0

EXAMPLE:

MZP=2524.345

DESCRIPTION:

The mass-to-charge ratio of a parent ion that corresponds to the values in the variable "MZF".

HISTORY:

Introduced in v. 1.0

 

 


1.13 Parent z

 

Description

 

Some output file formats for MS/MS spectra do not contain parent ion charge. For these file formats, it is necessary to add this information manually, using this box.

 

CGI variable: ZP

 

ALLOWED:

An integer value > 0

EXAMPLE:

ZP=2
 
HISTORY:

Introduced in v. 1.0

 

 


1.14 File

 

Description

 

The input file name is entered into this box, usually using the “Browse” button to select the file using a conventional file browsing box. This file is actually copied to the server that performs to search, so the selection of very large files will make processing the file longer. The file formats supported are as follows:

 

1.  DTA files (Thermo and Micromass);

2.  RAW files (Thermo, requires lcq_dta.exe to be installed); and

3.  PKL files (Micromass).

 

 

CGI variable: INPUT_FILE

 

ALLOWED:

A character string that contains the path to a data file.

EXAMPLE:

INPUT_FILE=c:\temp\test.dta
 
HISTORY:

Introduced in v. 1.0

 

 


1.15 Daughter m/z’s

 

Description

 

Enter the m/z values for daughter ions into this box. Enter only one (1) m/z value per line. If you wish to enter intensity information as well as m/z, enter the m/z value, a comma and then the intensity. Do not leave any additional spaces or “tab” characters into the box.

 

CGI variable: MZF

 

ALLOWED:

A character string representing a list of mass-to-charge ratio and intensity pairs. The values are separated by commas. Each pair is separated by a line feed.

EXAMPLE:

MZF=1000.0,100
2125.3,50
3587.345,1


HISTORY:

Introduced in v. 1.0

 

 


2. Standard result report interface

 

The Sonar report page is a series of simple tables, that describe the proteins (or ESTs) found and a summary of the search conditions that were used. This page describes the various columns in these tables and supplies some explanation of the meaning of the terms used.

 

 


 


Figure 2. The standard result report interface

 

 

 

 

 

 


2.1 Expect

 

The expectation value is a simple statistic that allows the comparison of the reliability of results. Low expectation values ( 1) correspond to confident identifications.

For example: an expectation value of 10 means that purely random matching will produce about ten results as good as the one reported. Similarly, an expectation value of 1.0 means that 1 random result is expected to be as good as the one reported. An expectation value of 0.001 means that only 0.001 random results are expected to be as good as the one reported.

 

 


2.2 Result

 

The result column contains the evidence supporting the expectation value. The top line is a link to the NCBI's repository of sequence information, if possible. In the same box, the entries under the Redundant heading are database entries that result in identified peptides that are the same as those assigned to the entry displayed. They may only have a subset of these peptides.

Peptides where the assignment is ambiguous, a choice of assignment is made using the following rules:

1.  if a spectrum can be assigned to more than one peptide, the peptide is assigned to the peptide sequence that has the lowest expectation value;

2.  if two peptides have the same expectation value for the same spectrum, the peptide contained in the complete sequence with the best expectation value is chosen; and

3.  if two peptides have the same expectation value for the same spectrum and the complete sequences have the same expectation value, both peptides are shown as assigned.

The peptide assignments based on spectra that have been assigned to other peptides are shown in italics in the display. If the Show best assignments only box is checked, these peptides are not shown. These peptide assignments are not shown in the Search iteration page either.

 


2.3 a:b:y

 

These numbers correspond to the number of a-, b- and y-ions (Biemann notation) that were assigned.


2.4 m/z

 

The value shown is zm/z(m-a) where:

1.  z is the measured charge of the parent ion that has been associated with a peptide sequence;

2.  m/z is the measured mass-to-charge ratio of the parent ion that has been associated with a peptide sequence; and

3.  (m-a) is the mass difference between mass calculated from the measured parent ion m/z and z values (m) and the assigned peptide mass (a).

 

 


2.5 Fragmentogram

 

Fragmentation diagrams are a very simple-to-read representation of MS/MS data. The lines drawn above the sequence correspond to the relative intensity of fragment ions assigned to particular peptide bond cleavages. The lines are drawn between the two residues corresponding to the cleaved peptide bond. The length of the line corresponds to the relative intensity of the most intense ion assigned to a particular bond cleavage. Clicking on the fragmentation diagram produces a detailed analysis of the peaks assigned and the peaks that could not be assigned to the proposed sequence.

If a fragmentogram is shown in italics, the spectrum that gave evidence for the existence of this peptide produced a better assignment to a peptide in another protein.

 


2.6 Sections

 

The Sonar report is broken up into 1 to 5 sections. These sections are protein, cytoskeleton, ribosome, and  artifacts (the Custom heading is optional). Results are placed into the Protein category if they do not fit into the other ones. The keywords used for each heading are listed below the title line of each section.

 

If the search was performed on a genomic or EST database, the cytoskeleton, ribosome, and artifacts sections are not shown, because the databases are not annotaed well enough to use this feature.

 


2.7 Conditions

 

This table contains the search parameters used to perform a search. It appears at the end of the Sonar report.

 


3. Standard result detail interface

 

The Sonar detail report page is a series of simple tables, that describe the peptides that have been assigned by Sonar. This page describes the various columns in these tables and supplies some explanation of the meaning of the terms used.

 

 

 

Figure 3. The Standard result detail interface

 

 


3.1 Assigned sequence

 

The peptide that was assigned to the parent ion's MS/MS spectrum, shown as a fragmentogram. Clicking on the fragmentogram will open a tools page that allows you to manipulate the sequence and calculate fragment masses.

 

 


3.2 Full sequence

 

Clicking on the GI number (an NCBI identifier) takes you to a tools page for manipulating the full sequence of the protein/EST. For EST's, clicking on the “frame translation" button takes you to a tools page for manipulating the amino acid translation of the EST reading frame that contains the assigned peptide.

 

 


3.3 Spectrum number

 

In DTA or PKL files, the ordinal number corresponding to the MS/MS spectrum that produced this match.

 

 


3.4 Parent properties

 

The parent ion m/z, charge (z), measured mass (m), assigned mass (a), and the difference between the measured and assigned masses (m-a).

 

 


3.5 Ion assignments

 

The relative intensity of a fragment ion (1 to 10 ° symbols), the measured m/z (m), the assigned m/z (a), the difference between the measured and assigned m/z, the assigned charge (z) and the assignment, in Biemann notation. The assignment is followed by the two residue pair cleaved to produce the fragment and a small molecule mass loss in parenthesis (if required).

 

 


3.6 Unassigned Ions

 

The relative intensity and m/z values of all fragment ions not assigned by Sonar.

 

 

 

 


4. Standard result iterator interface

 

 

 

 

Figure 4. The Standard result iterator interface.

 

This type of search involves using the results of previous searches to attempt to mine more information from mass spectra. The user is presented with the results of the previous search, including a detailed list of the parent ions that were submitted for the search and the main search parameters that were used to perform the previous search. The parent ions are listed in three tables:

1.  Unassigned parent ions - these are parent ions that could not be associated with a sequence. All unassigned parent ions are checked for further searching by default.

2.  Parent ions similar to assigned ions - these are ions that were not associated with a sequence. However, ions with parent ion masses within the parent ion error value were associated with a sequence, making it possible that these ions are either different charge states of the same peptide, or they are weaker MS/MS spectra than the assigned ions. These ions are unchecked by default.

3.  Assigned parent ions - these parent ions were assigned to a sequence and are unchecked by default. If a parent ion was assigned to more that one sequence, the highest scoring sequence is shown.

To perform an iterative search, simply reset the main search parameters at the top of the page, select which of the parent ions you wish included in the search and press the button. A search iteration can be iterated further: the number of iterations are indicated at the top of each search page, along with links to go back to an earlier iteration.

 

The search controls on the top of the page have the same meanings and definitions as the search controls in Section 1 (see above). The only difference is that any control that is drawn using a Javascript file in the Standard data entry interface is drawn here with the literal values from the Javascript file that was current when the original search was performed. Therefore, the search conditions available in an interator page directly reflect the conditions that were present when the original search was performed.