article

Quantitative Proteomics for Systems Biology

Posted: 29 May 2009 | Dr Claire Eyers, Deputy Director, Michael Barber Centre for Mass Spectrometry, University of Manchester | No comments yet

The pharmaceutical industry continues to experience a high attrition rate during the latter stages of small molecule therapeutic development, most disappointingly during the late, and highly expensive stages of Phase II and Phase III trial1. If left unchecked, it is likely that this late-stage failure in drug development will only increase the already staggering cost of getting pharmaceuticals to market. The failure of drugs at this stage in development occurs primarily because of problems with toxicity and/or failure to produce a significant effect in whole animal models (lack of efficacy)1. The increasingly popular approach of systems biology is perceived by many as a potential solution for overcoming these problems, enabling the design of effective, safe therapeutics on a realistic R&D budget.

The pharmaceutical industry continues to experience a high attrition rate during the latter stages of small molecule therapeutic development, most disappointingly during the late, and highly expensive stages of Phase II and Phase III trial1. If left unchecked, it is likely that this late-stage failure in drug development will only increase the already staggering cost of getting pharmaceuticals to market. The failure of drugs at this stage in development occurs primarily because of problems with toxicity and/or failure to produce a significant effect in whole animal models (lack of efficacy)1. The increasingly popular approach of systems biology is perceived by many as a potential solution for overcoming these problems, enabling the design of effective, safe therapeutics on a realistic R&D budget.

The pharmaceutical industry continues to experience a high attrition rate during the latter stages of small molecule therapeutic development, most disappointingly during the late, and highly expensive stages of Phase II and Phase III trial1. If left unchecked, it is likely that this late-stage failure in drug development will only increase the already staggering cost of getting pharmaceuticals to market. The failure of drugs at this stage in development occurs primarily because of problems with toxicity and/or failure to produce a significant effect in whole animal models (lack of efficacy)1. The increasingly popular approach of systems biology is perceived by many as a potential solution for overcoming these problems, enabling the design of effective, safe therapeutics on a realistic R&D budget.

Systems biology aims to use comprehensive knowledge of a biological system, be that a pathway, cell, tissue or organism, to help understand how that system will behave under experimentally untested conditions. If a biological system can be understood in sufficient depth, artificial manipulation using a computer model may provide important insights into the behaviour of that system, for example, under disease conditions. The mathematical models of these systems can be analysed to determine those parts of the network that contribute most to a particular outcome or disease state, and used to efficiently test the effects of manipulating defined points in the pathway. Subsequently, hypotheses can be generated about the potential effects of altering these biological networks at one or more points, which can then be more readily tested in ‘traditional’ benchtop experiments2. It is believed that ultimately this holistic approach to understanding biological networks will help to solve a range of complex biological problems, and in particular, enable greater understanding of how these networks could best be perturbed to solve healthcare related issues with limited necessity for traditional ‘wet’ biological experiments, minimising resources (time, money) and waste.

An ultimate goal in understanding biological systems would be the development of a ‘virtual physiological human’ (VPH)2,3, which if achieved, would play a significant role not only in improving our understanding of complex biological pathways and the interplay between them, but also aid in the design of better therapeutics. Crucially, however, accurate development of mathematical models requires knowledge of the biochemical interactions of the network or system and input of various coefficients and parameters: the state of each of the components (modified/unmodified; active/inactive), their kinetic constants (Vmax, kcat), affinity of their binding interactions (and the functional effects of those interactions), their absolute concentration and the change in their concentration (rate of synthesis versus degradation), heterogeneity of cellular localisation and the rate of translocation of components between different parts of the cell.

Traditionally, most of the parameters used in these models have been obtained from a combination of published studies performed at different times. However, as the analysis of specific systems as whole networks increases in popularity, more projects are being designed to provide the specific requisite parameters to build those models of interest. For example, ELISAs and quantitative western blotting have been used to provide information about the absolute amount of specific proteins within cell extracts4. Quantitative immunoblotting of this nature requires antibodies specific to the proteins of interest, which may need to be specifically generated. Additionally, it is a time-consuming and sometimes frustrating process, often with a limited range over which accurate sample quantification can be achieved. The linear range of quantification is highly dependant on both the avidity of the antibody/antigen interaction and the sensitivity of the method used in antibody detection (e.g. film versus phosphorimager). Moreover, all samples under investigation must be compared with a verified protein standard, and unfortunately accurate, consistent quantitative western blotting of a specific protein takes time to establish and is highly dependent on the skills of the experimentalist.

Over the last few years, accurate methodologies to determine the absolute amounts of proteins in biological extracts have been transformed, largely due to developments in the field of quantitative mass spectrometry (MS). MS separates analytes based on their mass-to-charge (m/z) ratio, but in itself is not an inherently quantitative analytical technique. The intensity of the signal output is highly dependent on a number of physico-chemical properties of the analyte: in addition to mass and charge, hydrophobicity, isoelectric point, gas-phase basicity and flexibility (among others) all play a role in the detection of peptides by MS5-8. Quantification of proteins, or more specifically, the peptides of which they are comprised, cannot therefore be achieved simply by recording the MS signal intensity, but rather relies on alternative strategies. Crucially, the vast majority of protein quantification is actually performed by analysis of one or more of their constituent peptides. Peptides, particularly those generated following tryptic proteolysis, are optimally suited in terms of their m/z range to MS analysis using the current instrumentation and are often therefore used as surrogates to elucidate protein quantity. The accuracy and precision of protein quantification is thus dependent on the selection of these peptide surrogates. Peptides that are not readily observed by MS (defined as non-proteotypic)9, are not unique to the protein of interest, or which contain sites of modification, will result in erroneous protein quantification. Peptides used for determining protein quantity must therefore be selected with care, and the recent development of computational algorithms to aid in the selection of these quantification or Q-peptides has made this problem significantly easier6,8,10.

Stable isotope dilution has been used for decades in quantitative elemental analysis by MS but has more recently been employed in the quantification of metabolites and peptides. This strategy relies on the principle that different isotopes of the same element generate identical mass spectrometric signals, but can be differentiated by virtue of a change in their m/z. A peptide containing naturally occurring levels of element isotopes can therefore be distinguished from a stable isotope-labelled or ‘heavy’ version of the same peptide sequence by a predictable difference in its m/z (defined by the incorporated stable-isotope), yet retain the same relative signal intensity (see Figure 1). Stable-isotope labelled peptides were first employed in proteomics as standards for determining relative protein levels a decade ago11-13, and recent advances have enabled similar strategies to be used for absolute protein quantification14,15. Data generated in these types of studies is critical for providing parameter inputs for molecular models. So, if a known quantity of the isotope-labelled Q-peptide is analysed concurrently with the sample of interest, the absolute amount of the analyte can be determined by comparison (see Figure 1)14,15.

eyers - Figure 1

Two methods are primarily used for the generation of stable-isotope labelled peptides as surrogates for peptide quantification: the AQUA (absolute quantification)15,16 and QconCAT (quantification concatamers of peptides)14,17,18 strategies (see Figure 2). AQUA relies on the individual synthesis, purification and quantification of each of the reference Q-peptides using the relevant isotope-labelled amino acid. In contrast, an artificial protein, QconCAT, can be used to generate a number (≤40) of labelled peptides simultaneously by concatenation of the peptide cDNA sequences, and generation of an isotope-labelled protein following expression in E. coli in the required isotope-labelled medium (usually minimal medium containing [13C6]-Lys/Arg). Proteolysis of the affinity purified QconCAT then yields a stoichiometric mixture of all the reference Q-peptides encoded by the concatenated DNA. While AQUA is best used for the quantification of a limited number of proteins where a select number of reference peptides need to be generated, projects requiring the absolute quantification of a significant number of proteins, for example all those proteins in a defined pathway, will benefit from the higher throughput QconCAT methodology. Both methods have their limitations: many peptides cannot be readily synthesised, or exhibit problems with solubilisation or stability;19 some QconCAT constructs may fail to express due to issues with RNA secondary structure, which influences the efficiency of translation of these artificial proteins19. However, the order of Q-peptides in a QconCAT construct can be readily optimised to minimise the likelihood of expression problems17. Concerns have been raised that failure of QconCAT proteins to yield equimolar amounts of the Q-peptides following proteolysis will significantly influence the accuracy of quantification. However, properties of the selected Q-peptides which may prevent complete proteolysis20 and the generation of limit peptides from the QconCAT, will also hinder complete proteolytic digestion of the native protein. Accurate quantification using either method of peptide generation is thus reliant on the appropriate selection of representative peptides, namely those that undergo complete proteolysis in the native protein, with the accuracy of quantification of any given protein being improved by using multiple (usually two) peptide surrogates. Label-free methods for determining the absolute amounts of proteins in cellular extracts are also being developed21, although the precision of the current strategies fall well below the levels attainable using isotope-labelled peptides and/or proteins as references14,21,22.

eyers - Figure 2

Quantification using isotope-labelled peptides has been achieved using a variety of mass spectrometers, and exploiting a range of analytical strategies23. The most popular, and arguably the most sensitive, is the application of either selective or multiple reaction monitoring (SRM or MRM respectively) using a triple quadrupole (QqQ) mass spectrometer (see Figure 3). Using this strategy, selected peptides in a complex mixture can be quantitatively analysed with high precision and sensitivity. SRM focuses the MS analysis on peptides (precursor ions) of a pre-defined m/z, which fragment during collision-induced dissociation (CID) to yield a specific fragment or product ion, also of defined m/z. While the first quadrupole is set to transmit only ions at the m/z of interest of the precursor, the third quadrupole filters the signal for a defined product ion, while the second quadrupole functions as the collision cell. A typical proteomics experiment that aims to identify peptide sequences will involve the separation of all the peptides in the mixture by reversed phase chromatography (LC) prior to MS analysis of all the product ions for a particular precursor. However, as SRM analyses are focused, with the mass analysers spending the vast majority of their time monitoring pre-defined ions of interest, the sensitivity of quantification may be enhanced by up to two-orders of magnitude. Additionally, several such precursor/product ion transitions can be monitored during the course of a typical LC-MS experiment. Assigning specific transitions to defined regions in the LC elution profile increases the number of peptides that can be monitored in a single experiment without compromising sensitivity. Depending on the transitions selected, it is conceivable that a given product ion could be generated by multiple peptides of the same m/z. Specificity of peptide analysis is therefore often increased by employing multiple reaction monitoring (MRM), where two or more fragment ions are monitored for a given precursor ion (see Figure 3). To optimise the sensitivity of these quantitative analyses, the transitions used in these SRM/MRM experiments need to be defined with care; emphasis should be given to the most intense fragment ions, and MRM experiments limited to no more than four transitions. The fragment ion transitions can be selected based on previous tandem MS experiments, where information regarding the intensity of the fragment ions has been evaluated, or predicted based on the known mechanism of CID and the predictable influence of certain amino acids during fragmentation.

eyers - Figure 3

The absolute amount of the native peptide is then calculated by comparing the extracted ion chromatogram (XIC) signals, (peak area or peak intensity) for the transitions corresponding to the unlabelled native peptide and the isotope-labelled reference peptide of known amount. Manual analysis of large sets of quantitative data of this nature can be excessively time-consuming and prone to error. However, programs such as MaxQuant and QconCATanalyser24,25 have been developed to aid in the automated searching and quantification of MS data employing stable-isotope dilution methods, which both improve the precision of quantification and significantly reduce the time for data analysis.

The future

In the future, quantification experiments of the type described are likely to become routine. An appropriate experimental pipeline employing Q-peptide prediction software, automated growth of purified isotope-labelled reference peptides, optimised multidimensional peptide separation and further development of software suitable for the analysis of quantitative MS data from a variety of instrument platforms should expedite this process. It is therefore expected that proteome quantification for ‘simple’ model organisms, grown under specified conditions, will become available in the next few years. However, it is likely that global quantification of the human proteome is still some way off, in no small part due to the presence of some 30,000 proteins, each expressed in perhaps 10 to one hundred differentially modified and spliced variants. Therefore, while the idea of generating quantitative data for development of a ‘VPH’ is highly exciting and certainly revolutionary, in reality it is firmly rooted in the realms of science fiction. Quantitative information would be required for each expressed protein in each cell type, under a wide range of physiological conditions. Critically, most current quantitative studies also fail to take into account the myriad of post-translational modifications that regulate protein function, and as far as mathematical modellers are concerned, need to be considered as separate ‘species’. A particular note of optimism comes from the field of protein phosphorylation, where quantitative approaches based on the principles of isotope dilution have successfully been exploited to calculate the stoichiometry of phosphorylation at a number of different sites. It is likely that this methodology could be extended to all phosphorylated peptides in a biological system, and indeed all modified peptides26-29. However, quantification of these sites of post-translational modification requires that all the sites of modification are first indentified, and this is one aspect of proteomics analysis that remains far from complete for any cell type or organism.

Acknowledgements

I would like to thank Professor Hans Westerhoff for stimulating discussion and the Royal Society for support by award of a Dorothy Hodgkin Research Fellowship and a Research Grant (RG080513).

References

  1. Kola, I. and Landis, J. (2004), Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov, 3(8): p. 711-5.
  2. Kell, D.B. (2007), The virtual human: towards a global systems biology of multiscale, distributed biochemical network models. IUBMB Life, 59(11): p. 689-95.
  3. Viceconti, M., Clapworthy, G., and Van Sint Jan, S. (2008), The Virtual Physiological Human – a European initiative for in silico human modelling. J Physiol Sci, 58(7): p. 441-6.
  4. Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O’Shea, E.K., and Weissman, J.S. (2003), Global analysis of protein expression in yeast. Nature, 425(6959): p. 737-41.
  5. Tang, H., Arnold, R.J., Alves, P., Xun, Z., Clemmer, D.E., Novotny, M.V., Reilly, J.P., and Radivojac, P. (2006), A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics, 22(14): p. e481-8.
  6. Fusaro, V.A., Mani, D.R., Mesirov, J.P., and Carr, S.A. (2009), Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat Biotechnol, 27(2): p. 190-8.
  7. Mallick, P., Schirle, M., Chen, S.S., Flory, M.R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., and Aebersold, R. (2007), Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol, 25(1): p. 125-31.
  8. Eyers, C.E., Lau, K.W., Wedge, D.C., Kell, D.B., Gaskell, S.J., and Hubbard, S.J. (2009), Prediction of QconCAT peptides for absolute quantitative proteomics using consensus machine learning approaches. J Prot Res (submitted).
  9. Kuster, B., Schirle, M., Mallick, P., and Aebersold, R. (2005), Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol, 6(7): p. 577-83.
  10. Wedge, D., Lau, K.W., Gaskell, S.J., Hubbard, S.J., Kell, D.B., and Eyers, C.E. 2007. Peptide detectability prediction following ESI mass spectrometry using genetic programming. in GECCO 2007: Proceedings of the 9th annual conference on Genetic and Evolutionary Computation.
  11. Oda, Y., Huang, K., Cross, F.R., Cowburn, D., and Chait, B.T. (1999), Accurate quantitation of protein expression and site-specific phosphorylation. Proc Natl Acad Sci U S A, 96(12): p. 6591-6.
  12. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. (1999), Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol, 17(10): p. 994-9.
  13. Pasa-Tolic, L., Jensen, P.K., Anderson, G.A., Lipton, M.S., Peden, K.K., Martinovic, S., Tolic, N., Bruce, J.E., and Smith, R.D. (1999), High Throughput Proteome-Wide Precision Measurements of Protein Expression Using Mass Spectrometry. J Am Chem Soc, 121(34): p. 7949-7950.
  14. Beynon, R.J., Doherty, M.K., Pratt, J.M., and Gaskell, S.J. (2005), Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides. Nat Methods, 2(8): p. 587-9.
  15. Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W., and Gygi, S.P. (2003), Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci U S A, 100(12): p. 6940-5.
  16. Kirkpatrick, D.S., Gerber, S.A., and Gygi, S.P. (2005), The absolute quantification strategy: a general procedure for the quantification of proteins and post-translational modifications. Methods, 35(3): p. 265-73.
  17. Pratt, J.M., Simpson, D.M., Doherty, M.K., Rivers, J., Gaskell, S.J., and Beynon, R.J. (2006), Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Nat Protoc, 1(2): p. 1029-43.
  18. Rivers, J., Simpson, D.M., Robertson, D.H., Gaskell, S.J., and Beynon, R.J. (2007), Absolute multiplexed quantitative analysis of protein expression during muscle development using QconCAT. Mol Cell Proteomics, 6(8): p. 1416-27.
  19. Mirzaei, H., McBee, J., Watts, J., and Aebersold, R. (2007), Comparative evaluation of current peptide production platforms used in absolute quantification in proteomics. Mol Cell Proteomics.
  20. Siepen, J.A., Keevil, E.J., Knight, D., and Hubbard, S.J. (2007), Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J Proteome Res, 6(1): p. 399-408.
  21. Silva, J.C., Gorenstein, M.V., Li, G.Z., Vissers, J.P., and Geromanos, S.J. (2006), Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics, 5(1): p. 144-56.
  22. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J., and Kuster, B. (2007), Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem, 389(4): p. 1017-31.
  23. Pan, S., Aebersold, R., Chen, R., Rush, J., Goodlett, D.R., McIntosh, M.W., Zhang, J., and Brentnall, T.A. (2009), Mass spectrometry based targeted protein quantification: methods and applications. J Proteome Res, 8(2): p. 787-97.
  24. Nilse, L. and Hubbard, S.J. 2008. Software tools for the analysis of quantitative proteomics data. in Genomes to Systems Manchester, UK.
  25. Cox, J. and Mann, M. (2008), MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol, 26(12): p. 1367-72.
  26. Gerber, S.A., Kettenbach, A.N., Rush, J., and Gygi, S.P. (2007), The absolute quantification strategy: application to phosphorylation profiling of human separase serine 1126. Methods Mol Biol, 359: p. 71-86.
  27. Johnson, H., Eyers, C.E., Eyers, P.A., Beynon, R.J., and Gaskell, S.J. (2009), Rigorous determination of the stoichiometry of protein phosphorylation using mass spectrometry. J Am Soc Mass Spectrom, (Submitted).
  28. Ciccimaro, E., Hanks, S.K., Yu, K.H., and Blair, I.A. (2009), Absolute Quantification of Phosphorylation on the Kinase Activation Loop of Cellular Focal Adhesion Kinase by Stable Isotope Dilution Liquid Chromatography/Mass Spectrometry. Anal Chem.
  29. Singh, S., Springer, M., Steen, J., Kirschner, M.W., and Steen, H. (2009), FLEXIQuant: A Novel Tool for the Absolute Quantification of Proteins, and the Simultaneous Identification and Quantification of Potentially Modified Peptides. J Proteome Res.

Related topics

Related organisations