Sequence Specific Retention Calculator
SSRCalculator
Version 3.2 ©2007
MB Centre for Proteomics
and Systems Biology
History
The first version of the model correlating hydrophobicity and retention time (r²~ .94 on a sample size of  ~ 350 tryptic peptides) was developed for use with 300Å sorbents, and described in a paper in Molecular and Cellular Proteomics.
A larger data set of ~2000 tryptic peptides was used for the development of SSRCalculator Version 2.0, and the model was presented at the 52th ASMS Conference, Nashville, TN.  The r²of the correlation improved to  ~ 0.96. 
Both versions 1.0 and 2.0 were made available to the public at Manitoba Centre for Proteomics' WebSite.
Version 3.0 was developed.  Correlation r² improved to ~ 0.98 with the same set of 2000 peptides.
Version 3.1 extended SSRCalculator's capability to allow the use of 100Å pore size sorbent (PepMap100, LCPackings-Dionex).  Correlation r²~ 0.98 on a data set of ~ 2700 peptides.

Waters XTerra (pH 10 conditions) column was chosen as a candidate for second dimension RP separation based on report by M. Gilar et al.
The pH 10 C18 algorithm similar to version 3 for TFA conditions was developed.  Correlation r²~ 0.97 on a set of ~ 3500 peptides
Version 3.2  TFA algorithms for 100Å and 300Å columns was developed using data sets of ~5500 peptides
 

Model
The model was developed based on the measurement of retention times of 346 tryptic peptides in the 560-4000 Da mass range, derived from a mixture of 17 protein digests. These peptides were measured in HPLC-MALDI-single MS runs, with peptide identities confirmed by MS/MS. The model relies on summation of the retention coefficients of the individual amino acids, as in previous approaches, but additional terms are introduced that depend on the retention coefficients for amino acids at the N-terminal of the peptide. In the 17-protein mixture, optimization of two sets of coefficients, along with additional compensation for peptide length and total hydrophobicity, yielded a linear dependence of retention time on hydrophobicity, with an R-squared value about 0.94. Its applicability was tested on columns of different sizes, from nano- to narrow-bore, and for direct sample injection, or injection via a pre-column. It can be used for accurate predictions of retention times for tryptic peptides on reversed phase (300 Å pore size) columns of different sizes with a linear water-acetonitrile gradient and with trifluoroacetic acid as the ion-pairing modifier. Other modifiers (acetic, formic acid) reduce prediction accuracy significantly. As well, the use of the algorithm for RP columns with other pore sizes and alternative end-capping chemistry is not recommended. Detailed information about the algorithm can be found in the paper published in Molecular and Cellular Proteomics (citation)
The improved SSRCalculator was developed and optimized based on a library of more than 2000 tryptic peptides in the 560-5000 Da mass range, derived from mixtures of a number of protein digests. The resultant peptides were measured in HPLC-MALDI-single MS runs, with peptide identities being confirmed by MS/MS. Version 2 of SSRCalculator includes all the features of version 1
The model is similarly based on the summation of retention coefficients of the individual amino acids,taking into account a number of correction factors related to:
  • retention coefficients for amino acids at the N-terminal of the peptide,
  • peptide length, and
  • total peptide hydrophobicity.
New correction factors in this version are related to:
  • retention coefficients for amino acids at the C-terminal of the peptide,
  • uniformity of distribution (i.e. clustering)of relatively hydrophobic amino acids along the peptide chain, and
  • peptide isoelectric point.
The third version of SSRCalculator was developed for the same set of more than 2000 peptides as version 2. Version 3 of SSRCalculator includes all the features of version 2 and version 1 of SSRCalculator.
As well, new correction factors in this version recognize:
  • the effect on retention time of a peptide's propensity to form helical structures,
  • an additional length-correction for smaller peptides, and
  • the effect of missed tryptic cleavages.
Application of the new correction factors allowed significant improvement in the predictive ability of the model: an r² value of about 0.98 was obtained for the set of 2000 peptides, compared to 0.94 and 0.96 for versions 1 and 2 respectively.
Detail description of the version 3 algorithm is provided in (Krokhin, O.V. Anal. Chem. 2006, 78, 7785-7795.)
This version was developed for the set of ~ 2700 tryptic peptides separated on PepMap100 sorbent (LCPackings-Dionex) and confidently identified by off-line HPLC-MALDI MS (MS/MS).
Optimization procedure for this version required adjustment of:
  • corrections related to the size of the peptide, and
  • corrections of retention coefficients of individual amino acids.
Version 3.2. is more robust than 3.1, developed and optimized using a dataset of ~5500 peptides for both 300Å and 100Å columns.
This version also includes support for the 100Å C18 XTerra column (pH 10 ammonium formate), which was optimized using a dataset of ~3500 peptides. The algorithm might be suitable for similar C18 supports stable at basic pH (e.g. XBridge, Gemini, etc.). The algorithm is similar to the TFA models' sequence-specific corrections and currently provides prediction accuracy with correlation factor r² ~ 0.97.
 

Using SSRCalculator
 
SSRCalculator is applicable to a 300 Å and 100 Å pore size reverse-phase C18 silica and a wide range of column sizes starting from the nano-flow version.
The HPLC pump must be able to provide a reproducible linear water-acetonitrile gradient and maintain a constant flow rate throughout the entire HPLC run.
The model was developed using trifluoroacetic acid (TFA) as the ion-pairing modifier. Both eluents A and B contained 0.1% TFA. Application of acetic, formic acids will not provide similar retention accuracy as for TFA.
Proteins or protein mixtures should be reduced, alkylated with iodoacetamide and digested with trypsin. Cysteine-containing peptides with free cysteines or alkylated with different protective agents (iodoacetic acid, 4-vinyl-pyridine, etc.) will retain differently from those alkylated with iodoacetamide. Therefore their retention can not be predicted using SSRCalculator in its current version.
We recommend the sample be purified (by dialysis for example) of excess reduction/alkylation agents and chaotropic agents before digestion.
We recommend the resulting peptides mixture be lyophilized, and redisolved in 0.5% water solution of the ion-pairing modifier used for eluent preparation (TFA, formic, acetic acid).
It is best to have preliminary information about the amount of sample loaded during injection. Column overloading may result in changing peak shapes and, as a consequence, lowered accuracy of retention time prediction. We recommend the use of a UV detector (especially for unknown samples) to provide additional information about the amount of the sample and the quality of separation over that given by MS directly.
SSRCalculator accepts any number of Peptides expressed as sequences of single-letter amino-acid codes. Separate multiple peptide sequences by new line characters (the Enter Key) or forward slashes ("/").
e.g.
   LCENIAGHLK
   HMDGYGSHTFK
   DALLFPSFIHSQK
   NPVNYFAEVEQLAFDPSNMPPGIEPSPDK
   ITSDFR
or
   LCENIAGHLK/HMDGYGSHTFK/DALLFPSFIHSQK
   DALLFPSFIHSQK/NPVNYFAEVEQLAFDPSNMPPGIEPSPDK
   ITSDFR
Version 1 interprets the 20 Amino acid single letter codes and "new line" and "/" characters, and ignores all other text in the Sequences window.
New single-letter amino acid codes are accepted in version 2. In addition to the 20 standard codes and sequence separators "/" and "New Line",
The code B is treated as a synonym of D, Aspartic acid.
Similarly Z is treated as a synonym of E, Glutamic acid.
Finally, X is treated as an unknown placeholder amino acid. While X has no retention coefficient to contribute, its inclusion may affect the correction factors based on sequence length and clustering.
The resulting dependence, Retention Time vs. Hydrophobicity of Peptides, is a linear function
RT=A+B*(HP);
where intercept A is the gradient delay time (individual for each HPLC system used) and slope B is a value related to the slope of acetonitrile gradient. B is constant for different HPLC systems as long as the same slope of the linear gradient is used. For example a water/acetonitrile gradient with 1.32% increase in acetonitrile per minute results in slope B~0.47. Shallower gradients will provide higher slopes as retention times increase. For example, halving the speed to a 0.66% increase in acetonitrile per minute gradient showed two times higher B~0.94. etc. There are three different approaches for calibrating HPLC system.
Calibration using external standard digest of a known protein: The standard protein should be chosen to provide a number of well defined peptides of different hydrophobicities (human/bovine albumin/transferrin are recommended). Digest your protein as described earlier. Perform separation and extract mass (sequence) data of the peptides along with their retention times. Calculate hydrophobicities for identified peptides using SSRCalculator. Plot RT vs. HP dependence and determine parameters A and B for your HPLC system and conditions used. You can reuse your parameters for analysis of successive unknown samples under the same conditions as long as the system provides reproducible LC separation.
Calibration using internal standard digest of a known protein: Add the standard protein to your unknown mixture prior to the digestion. The amount of the standard protein should be chosen lower than the amounts of unknown proteins in the sample, however still enough to provide confident identification of peptides from standard protein.
Detail procedures for calibration using a digest of a known protein (horse myoglobin) is presented in Krokhin, O.V. Anal. Chem. 2006, 78, 7785-7795.
Calibration using internal standard digest of an “unknown” protein: A number of peptides confidently identified during your HPLC-MS(MS/MS) run of an unknown sample can be used to calibrate the HPLC system. Very often the most abundant proteins in real biological samples (albumins for example) can be used for the internal standard.
Protein identification and characterization are two major proteomics tasks. SSRCalculator facilitates both procedures by accurate prediction of peptides’ retention times during RP HPLC separations.
MS protein identification: Linear dependence RT vs. HP for the set of peptides potentially assigned to belong identified protein will add confidence to MS identification of this protein based of peptides mass fingerprint.
Characterization of the protein often requires complete sequence coverage. The retention times of any missing fragments can be calculated using SSRCalculator, and MS spectra of respective fractions can be inspected manually.
 

Future Development
A group of researchers at the Manitoba Center for Proteomics is constantly at work to improve the predictive ability of SSRCalculator. Successive versions of the program will be available at Manitoba Center for Proteomics.
For questions related to SSRCalculator predictive algorithm please contact:
Oleg Krokhin
 
For questions related to SSRCalculator software development contact:
John Cortens
 

References
O. V. Krokhin, O.V.; Ying, S.; Cortens, J.P.; Ghosh, D.; Spicer, V; Ens, W.; Standing, K.G.; Beavis, R.C.; Wilkins, J.A.
“Use of Peptide Retention Time Prediction for Protein Identification by off-line Reversed-Phase HPLC-MALDI MS/MS”
Anal. Chem. 78, 6265-69 (2006).
Krokhin, O.V.
“Sequence Specific Retention Calculator - a novel algorithm for peptide retention prediction in ion-pair RP-HPLC:
application to 300 and 100 pore size C18 sorbents”
Anal. Chem. 78, 7785-95 (2006).
For C18 pH 10 column conditions, see:
Gilar, M.; Olivova, P.; Daly, A.E.; Gebler, J.C.
Anal. Chem. 75, 6426-34 (2005).
4. O. V. Krokhin, O.V.; Spicer, V; Ens, W.; Standing, K.G.; Wilkins, J.A.
“2D HPLC-MALDI MS analysis of complex protein mixtures with peptide retention prediction in both dimensions”
55th ASMS Conference on Mass Spectrometry and Allied topics Indianapolis, USA, oral presentation 2007.
View Presentation (PDF format)
O. V. Krokhin, R. Craig, V. Spicer, W. Ens, K. G. Standing, R. C. Beavis, J. A. Wilkins
“An improved model for prediction of retention times of tryptic peptides in ion-pair reverse-phase HPLC: its application to protein peptide mapping by off-line HPLC-MALDI MS”
Molecular and Cellular Proteomics 2004 Sep;3(9):908-19.
O. V. Krokhin, S. Ying, R. Craig, V. Spicer, W. Ens, K. G. Standing, R. C. Beavis, J. A. Wilkins
“New sequence-specific correction factors for prediction of peptide retention in RP-HPLC: application to protein identification by off-line HPLC-MALDI-MS”
52th ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN , May 23-27 (2004), TPZ 503.