article

Next Generation Sequencing: Current realities in cancer biology

Posted: 16 February 2011 |

The rate of progress in molecular cell biological sciences has become dramatic. This is fuelled in part by developments in technology, none more so than in the field of nucleic acid sequencing. So-called Next Generation Sequencing Platforms promise to revolutionise our understanding of the importance of genetic differences on an individual basis. According to the modern personalised or stratified medicine paradigms, this will revolutionise current practices in terms of early detection, treatment, diagnosis, prognosis and even prevention. Revolutions are apt to disappoint and drug pipelines have yet to justify such optimism yet molecular geneticists can point already to notable successes like the completion of their flagship project, the human genome in 2001, within time and within budget. What are the current realities? The field of cancer serves as an excellent test and would suggest that advances are being made incrementally but rapidly.

DNA sequencing

The rate of progress in molecular cell biological sciences has become dramatic. This is fuelled in part by developments in technology, none more so than in the field of nucleic acid sequencing. So-called Next Generation Sequencing Platforms promise to revolutionise our understanding of the importance of genetic differences on an individual basis. According to the modern personalised or stratified medicine paradigms, this will revolutionise current practices in terms of early detection, treatment, diagnosis, prognosis and even prevention. Revolutions are apt to disappoint and drug pipelines have yet to justify such optimism yet molecular geneticists can point already to notable successes like the completion of their flagship project, the human genome in 2001, within time and within budget. What are the current realities? The field of cancer serves as an excellent test and would suggest that advances are being made incrementally but rapidly.

Cancer is a genetic disease in which inheritance can play a part. The huge complexity of acquired genetic changes is however of major importance. Functional consequences for a subset of these changes have already been determined and efficacious new molecular therapies brought to patients as a consequence. Herceptin targeting of the HER2 receptor in breast, Erbitux targeting of the EGFR receptor in colorectal and Gleevec targeting of the BCR-ABL gene fusion products in chronic myelogenous leukemia all serve as paradigms of how knowledge about underlying oncogenic changes at the genetic level can be exploited to develop targeted therapies for use on a personalised basis, dependent on the presence of the alteration within individual patients. A major remaining challenge is to understand and act on the overwhelming remainder of the complexity which varies markedly on an individual basis and contributes to the development of resistance mechanisms.

First, it is useful to consider the progress already made using previous technologies including Sanger dideoxy sequencing, the almost exclusive method for over 30 years. Despite the limitation of older technologies, they have still been able to yield important insights into the significance of genetic changes in cancer and provide a platform on which the newer technologies can build. Sequencing studies have largely screened for mutations in exonic regions because of their presumed functional importance. Long, accurate read lengths allowed large PCR amplified regions from genes of interest to be read in a high throughput manner achieved by batch processes borrowed from the human genome sequencing project. Early comprehensive studies examined gene families, in particular the protein kinase superfamily because it contains many important effecter molecules implicated in cancer. The ‘kinome’ of 25 primary breast cancers yielded 92 somatic mutations1. These were distributed unevenly between patients with almost half not yielding a mutation and over half of the mutations occurring in one individual, suggestive of a novel mutator phenotype in the latter. There was a significant excess of non synonymous mutations, suggesting that a subset contribute to cancer development, so called ‘drivers’ and the converse that not all changes are relevant, so-called ‘passengers’ that owe their existence to co-occurrence with positively selected changes.

Extended studies screened complete exomes including those of the common cancers, breast and colorectal and the aggressive cancers pancreatic and glioma2-4. These indicated that a few genes such as TP53 were commonly mutated in many cases and in different cancer types, whilst a larger number of genes were mutated at a low frequency. PALB2 was identified as a susceptibility gene for pancreatic cancer5 and IDH1 was found to be commonly mutated in glioma when it was associated with a more favourable prognosis, further supporting the case for genetic stratification of cancers. Bioinformatics tools allowed mutations to be classified according to their likely significance based on their mutated frequency, disruptive effect and presence within conserved domains. The most convincing measures were based on evidence of clustering, especially around interface residues or active sites as found for GALNT5 and TGM3. Extensively curated gene based signalling pathways and networks, especially when mutations were combined with genome wide copy number alterations and expression data, allowed wider predictions to be made regarding the importance of key cellular pathways. Simple contingency tables could be set up to test whether alterations affected particular pathways more than would be expected by chance. In pancreatic cancers, 12 core pathways were affected in at least 67 per cent of the cases, with KRAS, G1/S, Hedgehog, TGF-beta and Wnt/Notch being affected in 100 per cent of the tumours.

Despite the success of these studies, significant challenges remain for the personalised cancer medicine paradigms. Few of the alterations discovered in the previous studies have been experimentally validated ex vivo and this would be outside of the capacity of current methods. The importance of the alterations has been statistically inferred and given the large numbers involved, false discovery rates may be high. Relatively narrow screens were performed which were poorly powered to sample the total possible variation. Typically, one million naturally occurring single base polymorphisms will differ between any two individuals. There are also extensive, natural polymorphic copy number variants which may encompass structural rearrangements including complex inversions. Somatic alterations can also include large scale gains and losses of material, plus intra and inter chromosomal rearrangements for which consequent gene fusions like TMPRSS2-ERG in prostate cancer have been considered to be the most significant6. Tumour samples are an admixture of tumour and normal cells placing increased demands on methods of scoring genetic variation because of the possible dilution of the tumour DNA signal by that from the normal cells. The latter can often be in the majority whilst the former can be genetically heterogeneous. Whole exome studies passaged the tumours through mouse to first remove the normal cells, also technically challenging. High density array platforms have made inroads towards addressing this complexity especially through comparative genome hybridisation (CGH) but the resolution is limited by probe density per genome and provides little insight into structural rearrangements7,8.

This is where Next Generation Sequencers come in. The highest throughput machines produce relatively short reads of 50 to 100 bases per template. Their power resides is achieved by reading hundreds of millions of templates together. Coverage allows for each base to be read 30 to 100 times for accuracy and since each read represents a single original molecule, polymorphisms can be distinguished with high probability by comparing their relative frequency, even in admixtures. Moreover, each template can be read from both ends allowing structural arrangements to be detected through mapping opposite ends of individual molecules back to obviously different parts of the genome. When the opposite ends of high numbers of short paired end reads map to different parts of the genome, this can only arise if those parts of the genome have become contiguous through genomic rearrangements. Initial studies mapped genomic structural locations of alterations. Complete coverage of the genome isn’t required to detect the rearrangements since they are inferred by the non contiguous ends. The arrangements turned out to be extensive but quite variable within cancer types, some having few such changes and others many. In breast, most were intrachromosomal rearrangements but there was little evidence for recurrent gene fusions9. Counting the reads per genomic region also allowed copy number changes to be assessed more quantitatively and accurately than by CGH. Copy number changes were often associated with rearrangements leading to amplicons10. Extended exon sequencing studies by Next Generation Sequencers require robust methods of target enrichment. These have proven to be problematic with few techniques achieving a satisfactory balance adequately representing all of the intended target regions at depth whilst minimising off target representation. Most successful methods rely on target capture by hybridisation to synthetic oligonucleotide probe sequences11. Provided that adequate bioinformatic resource is available, technically, it is easier to sequence whole genomes but clearly, this limits the number of cases possible per study. Output of the current leading platforms produced by ABI / Life Sciences and Illumina however have increased nearly 1000 fold in less than two years to now produce approximately 500 billion bases per machine per run. Each of the highest capacity machines has the capability to accurately compare between five and 10 human genomes simultaneously. This has spurred efforts to fully characterise cancer genomes.

The Cancer Genome Consortium is an international initiative created to characterise genetic variation across 50 cancer types of importance across the globe and their subtypes. Its strategy is based on the premise that given enough data, the significance of mutations can be inferred from their frequency. Aiming to detect mutations occurring at a frequency of three per cent or more, it will require the genomes of 500 cancers per to be sequenced or 25,000 in total12, needing 100,000 billion accurate bases determined from over 300 million, billion raw bases. In practice, the heterogeneity of specific types cannot be known and subtypes may have differing mutation spectra, further impacting on the final numbers needed. For example, Sorlie and colleagues identified six different breast cancer types based on global gene expression analysis13. This classification has been broadly accepted, in part because it associates with outcome data and also because it is consistent with accepted histo pathological markers including triple negative, basal, ER HER2 and ESR1 positive types. Similarly, Chronic Lymphocytic Leukaemia’s vary according to the rearranged status of their VHL genes.

Complete sequencing of lung cancer and melanomas has provided new insights into the extent and nature of acquired changes. In the lung cancer, over 22,000 somatic changes were found and in the melanoma, there were over 33,00014,15. These reflected the prevailing mutagens being largely tobacco related in the case of lung and two thirds were UV light related in the melanoma. It again brings into question the nature of ‘drivers’ and ‘passengers’. A significant proportion of the changes could have a small undetectable effect that would be hard to therapeutically manipulate. Alternatively, there may be a few important changes in a background of ‘passengers’. Further sequencing will help to distinguish these possibilities. An important message can be connected with the melanoma study. Conventional sequencing had identified mutation in BRAF as an important ‘driver’ in 66 per cent of melanomas16. Drugs that antagonise the mutationally activated BRAF have been developed and are achieving unprecedented responses for this notoriously resistant tumour type in early stage trials17. Despite the seemingly overwhelming background of genetic alterations in melanoma, there may be far fewer, large effect changes that can be targeted. Melanoma appears consistent with the oncogene addiction hypothesis which asserts that each cancer is dependant on one or a few oncogenic changes.

Next Generation Sequencing is unlikely to be restricted to screening for functional changes. High read depths open the possibility for extremely sensitive detection enabling applications such as detection of early stage cancer or residual disease in resection margins. Response or relapse could similarly be monitored in bodily fluids like blood or ascites. Pharmacogenomic studies may be similarly transformed. Single Nucleotide Polymorphism derived haplotypes currently used in association studies to compare genotypes with outcome are commonly monitored via assay platforms based on detection by hybridisation or mass spectroscopy. These can be costly and unwieldy but most importantly are based on early estimates of human genetic variation. The pilot phase of the ongoing 1000 human genomes project where Next Generation Sequencing has been used to read whole genomes for 1000 people, reports whole genomes for 179 individuals and exon sequences for 697 individuals representing seven populations18. Fifteen million more new polymorphisms have been catalogued, more than doubling the existing collection. Rare variants with a large impact may be more important to the population as a whole than common variants having a small impact. Sequencing may prove to be more effective in association studies than current approaches. Clinical studies will require faster turn around times than current systems achieve. In this respect, new platforms like the Ion Torrent are a welcome development. This silicon chip based detection has lower throughput currently compared to the leaders and costs per base higher. It does however have simple operation and a small footprint with a two hour turn around time, a clear advantage for real time clinical decisions like patient stratification.

References

  1. Stephens, P., et al., A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat Genet, 2005. 37(6): p. 590-592
  2. Wood, L.D., et al., The Genomic Landscapes of Human Breast and Colorectal Cancers. Science, 2007. 318(5853): p. 1108-1113
  3. Jones, S., et al., Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science, 2008. 321(5897): p. 1801-1806
  4. Parsons, D.W., et al., An Integrated Genomic Analysis of Human Glioblastoma Multiforme. Science, 2008. 321(5897): p. 1807-1812
  5. Jones, S., et al., Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene. Science, 2009. 324(5924): p. 217
  6. Tomlins, S.A., et al., Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer. Science, 2005. 310(5748): p. 644-648
  7. Beroukhim, R., et al., The landscape of somatic copynumber alteration across human cancers. Nature. 463(7283): p. 899-905
  8. Bignell, G.R., et al., Signatures of mutation and selection in the cancer genome. Nature. 463(7283): p. 893-898 Bignell, G.R., et al., Signatures of mutation and selection in the cancer genome. Nature. 463(7283): p. 893-898
  9. Stephens, P.J., et al., Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature, 2009. 462(7276): p. 1005-1010
  10. Campbell, P.J., et al., Identification of somatically acquired rearrangements in cancer using genomewide massively parallel paired-end sequencing. Nat Genet, 2008. 40(6): p. 722-729
  11. Mamanova, L., et al., Target-enrichment strategies for next-generation sequencing. Nat Meth. 7(2): p. 111-118
  12. International network of cancer genome projects. Nature. 464(7291): p. 993-998
  13. Sorlie, T., et al., Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(19): p. 10869-10874
  14. Pleasance, E.D., et al., A comprehensive catalogue of somatic mutations from a human cancer genome. Nature, 2009. advance online publication
  15. Pleasance, E.D., et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature, 2009. advance online publication
  16. Davies, H., et al., Mutations of the BRAF gene in human cancer. Nature, 2002. 417(6892): p. 949-954
  17. Flaherty, K.T., et al., Inhibition of Mutated, Activated BRAF in Metastatic Melanoma. New England Journal of Medicine. 363(9): p. 809-819
  18. A map of human genome variation from populationscale sequencing. Nature. 467(7319): p. 1061-1073

About the Author

Professor D. Ross Sibson directs the CCR, Applied Cancer Biology Labs in the CR-UK Cancer Centre at the University of Liverpool. He has a long standing public/private sector interest in high throughput gene based analysis (Unilever Research, Amersham International, MRC – biological manager UK-HGMP, Glaxo). He is the inventor for multiple sequencing related patents, the former Director of Research at InteraSeq, he is active in Next Generation Sequencing and currently concerned with how systems analysis can inform the exploitation of cancer biomarkers.

Related diseases & conditions