European PROSPECTS for proteomics and systems biology
Posted: 3 December 2008 | Dr Anne Katrin Werenskiold, Project Manager, Interaction Proteome, Max Planck Institute of Biochemistry | No comments yet
In its Research Framework Programme 7 (2007-2013), the European Commission sets the focus in health research on bringing the huge high-throughput data collection efforts of earlier programmes to the systems level. The ultimate goal of systems biology is the integration of various types of experimental data into models that represent and simulate physiological cell function. To this end, information on gene and protein expression and its dynamics, protein localisation, modification states and activation, their interaction in larger supra-molecular complexes and functional roles need to be compiled from numerous sources and formats and brought into context using advanced computational tools.
In its Research Framework Programme 7 (2007-2013), the European Commission sets the focus in health research on bringing the huge high-throughput data collection efforts of earlier programmes to the systems level. The ultimate goal of systems biology is the integration of various types of experimental data into models that represent and simulate physiological cell function. To this end, information on gene and protein expression and its dynamics, protein localisation, modification states and activation, their interaction in larger supra-molecular complexes and functional roles need to be compiled from numerous sources and formats and brought into context using advanced computational tools.
In its Research Framework Programme 7 (2007-2013), the European Commission sets the focus in health research on bringing the huge high-throughput data collection efforts of earlier programmes to the systems level. The ultimate goal of systems biology is the integration of various types of experimental data into models that represent and simulate physiological cell function. To this end, information on gene and protein expression and its dynamics, protein localisation, modification states and activation, their interaction in larger supra-molecular complexes and functional roles need to be compiled from numerous sources and formats and brought into context using advanced computational tools.
PROSPECTS is a large collaborative project funded by the European Commission with €12 million over five years. It brings together leading experts in various disciplines to move the proteomics field to a ‘second generation’ state, where it will be able to deal with transient complexes and near complete descriptions of a proteome in time and space. The project will specifically focus on diseases related to folding stress. Unique insights into the molecular basis of multiple, in particular neurodegenerative, syndromes can therefore be expected.
Current proteome analyses are limited to a ‘static’ view in that they largely provide an inventory of proteins at a given point in time. Due to experimental constraints, this inventory represents the average of a significant amount of sample used for the analysis. The sequestration of proteins into different cell compartments, their relative abundance in each of these and the dynamics of protein distribution are neglected in this type of investigation.
The far reaching goal of ‘second generation proteomics’ to quantitatively measure the dynamics of the whole set of proteins in an experimental sample, will require advances in multiple fields. The PROSPECTS consortium has taken on this task and will follow a battery of approaches that are:
High end instrumentation
At this level of technology further significant improvements regarding the sensitivity and throughput in protein separation and analysis are essential to allow the determination of a full set of proteins in minute amounts of samples, ideally at the single cell level. These advances will set the stage for the mechanistic and functional investigation of molecular consequences of defined perturbations in the cell.
The technology core of the PROSPECTS consortium comprises the coordinating Mann group (Max Planck Institute of Biochemistry, MPIB, Martinsried) and its industrial partner Thermo Fisher Scientific, Bremen, the leading manufacturer of MS instrumentation. Both partners have collaborated successfully concerning the improvement of instrumentation in mass spectrometry (MS) for years, in part through their joint involvement in the ongoing European project Interaction Proteome [www.interaction-proteome.org]. They now team up with Proxeon (Odense, DK), a specialist in the design of hardware and software for proteome analysis by liquid chromatography at the nano-scale. Together, the partners will work to improve the pre-fractionation of peptides for MS analysis and to push MS instrumentation towards further increased sensitivity. A hint towards a potential way to go is outlined in a recent publication. It shows that isocratic solid phase extraction- liquid chromatography, as recently introduced by Proxeon in collaboration with researchers from the University of Southern Denmark (SDU), in combination with MS analysis on Thermo’s flagship instrument, the Orbitrap, provides a robust and sensitive platform for proteomic analysis1. Complementing the advances in MS based peptide analysis novel protocols will be established to optimally exploit the new technology and to facilitate and speed up proteomic analyses.
Investigation of labile protein complexes
In a second area of PROSPECTS, efforts will focus on individual macromolecular protein complexes. So far, structural analysis of large complexes has been limited to species that are stable enough to survive purification protocols. This precondition has excluded any fragile or only transiently formed complexes from investigation and the analysis of labile complexes that cannot be biochemically purified still remains a major challenge. PROSPECTS has devised multiple approaches to tackle this issue.
One strategy, headed by the Baumeister group at MPIB, will use Single Particle Analysis (SPA) by cryoelectron microscopy. One advantage of this technique is in the rapid freezing of the biological sample, thus preserving even fragile complexes. With the new generation Titan Krios microscope that has only recently become available, a visualisation of large complexes at nanometre resolution is feasible. Challenges ahead are the desperate need for automisation of the SPA pipeline to reduce the processing time for single complexes from months to days. Once the throughput of the SPA pipeline has increased to delivering structures of complexes at nanometre resolution routinely within days, the technique will be optimally suited to bridge the gap between cell and structural biology.
In another approach, mass spectrometry will be used to characterise large complexes. The Robinson lab (University of Cambridge) has pioneered this development and built a custom made instrument for the analysis of large complexes in the gas phase after electrospraying. It has demonstrated the power of the technology in the recent analysis of a large (800kDa) protein complex. Subunit interactions between the 13 components of the complex and the stoichiometry of individual stable sub-modules of the complex along with their interactions were characterised using this combination of electrospray and mass spectrometry2. This biochemical approach to determine overall topology of protein complexes bridges the structural and mass spectrometric efforts.
In addition to freezing or electrospraying, new tools will be developed for the stabilisation of labile interactions through chemical cross-linking. In this approach by the Aebersold group (ETH, Zurich), cross-linked peptides will be analysed as indicators of protein-protein interactions and protein conformation. Two main obstacles to this strategy are in the complexity and low abundance of the cross-linked peptides in the sample mixture to be analysed by mass spectrometry and in the computational analysis of the experimental results. These basic issues have already been solved and the feasibility of this strategy was described recently3. Care will be taken to ensure that the cross-linking protocols developed are compatible with an application in the analyses of large complexes by either mass spectrometry or cryoelectron microscopy.
The work on fragile complexes will be complemented by homology modelling of the experimentally characterised complexes in-silico. Making use of the data obtained in the experimental partner laboratories, the Serrano group (Centre de Regulacio Genomica, Barcelona) will devise and improve methods to model the interactions. Based on pre-existing high resolution structures, the structure and interactions of proteins belonging to the same family can be inferred by homology. Such models enable a prediction as to whether all components of a complex identified by the experimentalists are compatible with the formation of a single type of complex. The definition binding sites to different interaction partners might indicate mutually exclusive interactions in the case of overlapping binding motifs. Such a finding would provide evidence for structural variability and the existence of multiple variants of a given complex. In such a way, these computational tools will provide information instrumental in constraining and validating protein structures and interactions. Furthermore, this strategy will lead on to an analysis of dynamic changes in the relative abundance of variants of individual complexes “in space and time”, as they might occur, for example, as a consequence of changing environmental conditions. A variety of tools already available for the analysis of interaction networks has been recently reviewed4.
Proteome dynamics in space (and time)
As outlined above, present proteomics studies largely do not take into account the specific subcellular distribution of identified proteins as they represent the “average proteome” of a large sample. Since proteins can interact only if residing in the same cell compartment in close vicinity to their partners, spatial distribution of proteins is key in regard to protein interaction and cell function. So far, proteomics studies have been confined to the demonstration of the presence of proteins in defined cellular compartments without determining their relative abundance in others. PROSPECTS will address this issue and investigate the relative distribution of the larger part of the proteome in the three major compartments of the cell, the membrane, cytosol and nucleus. A ‘Protein Localisation Index’ will be introduced as a reference for the relative quantitative distribution of proteins in these subcellular compartments.
This large scale analysis of subcellular proteomes will mostly rely on biochemical fractionation and advanced quantitative MS analysis, as developed within the technology core of the project described above.
With increasing throughput and reduced sample sizes, an investigation of time-dependent changes in the relative distribution of proteins in different subcellular locations becomes feasible. Meanwhile various studies have used quantitative proteomic technologies like SILAC or iTRAC to measure the relative abundance of hundreds of cellular proteins and their stimulus induced changes in single experiments after differential labelling with stable isotopes5.
In an independent approach, live cell fluorescence imaging can complement the findings from the global biochemical analysis. The use of fluorescently labelled reporter proteins enables the investigation of protein localisation, dynamics and interaction over an extended time period in single living cells. As a non-invasive technique, it has the unique advantage of permitting a follow-up of events in time lapse video analysis in the same cell.
This dual approach outlined here has already been used successfully in a joint investigation of the dynamics of nucleolar proteins by three PROSPECTS partners, the laboratories of M. Mann (MPIB), A. Lamond (University of Dundee) and J. Andersen (University of Southern Denmark)6. This study provides a proof of concept for the power of this combined approach using proteomics and life cell imaging to resolve protein dynamics in a cell organelle. The goal of annotating the human proteome, however, will require enhanced throughput. The impact of the new technologies on the analysis of dynamic processes with particular regard to the cell nucleus has been reviewed recently7.
Complementary to the in-vitro work, the relevance of ‘Protein Localisation Index’ profiles in “real life” will be assessed in collaboration with the Swedish Proteinatlas project (Uhlén group, KTH, Stockholm) (www.proteinatlas.org). This huge antibody resource adds the unique opportunity to cross-validate high-throughput data obtained by MS-based proteomics and immuno-histological findings with antibodies against thousands of human proteins, using both tissue array data and confocal microscopy. Through the collaboration with the Swedish Protein Atlas group, the cell line data obtained in the above analyses will be extended to normal and pathological human tissue, thus exploring the biomedical relevance of the data.
In particular, the modification of the proteome in cells exposed to folding stress and as a function of age will be monitored, representing the central model for the application of the novel technologies developed. This field is of direct relevance to neurodegenerative and other ageing related diseases. The latter studies will be coordinated by the Hartl group (MPIB, Martinsried), experts for chaperone-dependent protein folding. They recently elucidated mechanisms underlying misfolding of proteins inducing aggregates that are causal in neurodegenerative diseases, including Alzheimer’s, Parkinson’s and Huntington’s disease.
Data warehousing and modelling
As is evident from the above, the project will require highly advanced data technology to cope with the huge datasets expected from the large scale studies. Not only will these data collections have to be stored in an easily retrievable manner, but also data in different formats obtained by multiple technologies will have to be integrated. Current data bases do not conceptually cover all types of information to be produced by PROSPECTS. The bioinformatics core of the project therefore will have to pioneer new concepts, for example to represent the quantitative localisation information. This work will be led by the bioinformatics experts in the Linial group (University of Jerusalem) and be based on their PANDORA system (Protein ANnotation Diagram ORiented Analysis) [http://www.pandora.cs.huji.ac.il/ ]. PANDORA is publically available at no cost and dedicated to in-depth biological analysis of large protein sets using annotation analysis and several annotation sources. Current functionalities already include data mining and the retrieval of information and multiple types of data acquired in the scientific community, an example for an annotation graph is depicted in Figure 2.
This work requires close collaboration with other existing databases in the project, as ProteinAtlas (KTH), Peptide Atlas (ETH, ISB Seattle) and MAPU (MPIB, Martinsried) and with international data base standardisation efforts to ensure that any new concepts fit seamlessly to generally agreed standards.
The modelling section of the project will build on the expertise of the Serrano group (CRG, Barcelona) and their modelling platform SmartCell. This environment is exceptional in that it allows the incorporation of any cellular geometry and already provides for small cell volume elements. It thus already meets some of the specific requirements of PROSPECTS. This basic model will be adapted to eukaryotic cells and extended to cope with the dynamic and spatial information gained in the project.
The comprehensive quantitative and spatially resolved analysis of the human proteome will transform proteomics from providing simple lists of components to generating a quantitative annotation of how and when proteins are localised in specific subcellular locations. This is not only a pre-requisite for a ‘systems-wide’ understanding of biological function, but will also open the stage for an effective application of this knowledge in biomedicine.
References
- Hørning OB, Kjeldsen F, Theodorsen S, Vorm O, Jensen ON. Isocratic solid phase extraction-liquid chromatography (SPE-LC) interfaced to high-performance tandem mass spectrometry for rapid protein identification. J Proteome Res. 2008;7:3159-67.
- Zhou M, Sandercock AM, Fraser CS, Ridlova G, Stephens E, Schenauer MR,Yokoi-Fong T, Barsky D, Leary JA, Hershey JW, Doudna JA, Robinson CV. Special Feature: Mass spectrometry reveals modularity and a complete subunitinteraction map of the eukaryotic translation factor eIF3.Proc Natl Acad Sci U S A. 2008 Epub ahead of print July 1, 2008, doi: 10.1073/pnas.0801313105
- Rinner O, Seebacher J, Walzthoeni T, Mueller LN, Beck M, Schmidt A, Mueller M, Aebersold R. Identification of cross-linked peptides from large sequence databases. Nat Methods. 2008. 5:315-8.
- Kiel C, Beltrao P, Serrano L. Analyzing protein interaction networks using structural information. Annu Rev Biochem. 2008;77:415-41.
- Andersen JS, Mann M. Organellar proteomics: turning inventories into insights. EMBO Rep. 2006; 7:874-9.
- Lam YW, Lamond AI, Mann M, Andersen JS. Analysis of nucleolar protein dynamics reveals the nuclear degradation of ribosomal proteins. Curr Biol. 2007;17:749-60.
- Trinkle-Mulcahy L, Lamond AI. Toward a high-resolution view of nuclear dynamics. Science. 2007; 318:1402-7.