DNA sequencing technologies and emerging applications in drug discovery
92
SHARES
Posted: 13 December 2011 |
In recent years, the development of Next Generation DNA Sequencing (NGS) technology has significantly impacted molecular biology research, resulting in many new insights and discoveries. NGS technology goes beyond traditional DNA sequencing with applications that reach across the central dogma of molecular biology from DNA to RNA and protein science. Drug discovery is beginning to benefit from the diversity of NGS, with applications in evidence across various therapeutic areas, such as oncology, immunology and infectious diseases.
DNA is the molecule of life, containing the information for the synthesis of RNA molecules and proteins, which in turn form structural components of the cell or catalyse essential biochemical processes. Understanding the sequence of DNA, which is made from the four basic building blocks or ‘nucleotides’, A,G,C and T, has resulted in great insights and discoveries in cellular biology, pathology and disease, culminating in the human genome project, which achieved the remarkable feat of determining the sequence of the three billion bases of the human genome.
The field of DNA sequencing has witnessed some key milestones in technology develop – ment since the description of the first revolutionary DNA sequencing techniques in 19771,2. The Sanger dideoxy sequencing method, discovered by the Nobel Laureate Fred Sanger, underwent the most significant improvements and became the first automated sequencing platform in the late 20th century. Advancements in the Sanger process were partly motivated by the advent of the USD 3 billion Human Genome Project, which required the development of high-throughput tech – niques3,4 (Figure 1A).
In recent years, the development of Next Generation DNA Sequencing (NGS) technology has significantly impacted molecular biology research, resulting in many new insights and discoveries. NGS technology goes beyond traditional DNA sequencing with applications that reach across the central dogma of molecular biology from DNA to RNA and protein science. Drug discovery is beginning to benefit from the diversity of NGS, with applications in evidence across various therapeutic areas, such as oncology, immunology and infectious diseases.
This report addresses the key factors shaping pharmaceutical formulation, including regulation, QC and analysis.
Access the full report now to discover the techniques, tools and innovations that are transforming pharmaceutical formulation, and learn how to position your organisation for long-term success.
What you’ll discover:
Key trends shaping the pharmaceutical formulation sector
Innovations leading progress in pharmaceutical formulation and how senior professionals can harness their benefits
Considerations and best practices when utilising QbD during formulation of oral solid dosage forms
Can’t attend live? No worries – register to receive the recording post-event.
DNA is the molecule of life, containing the information for the synthesis of RNA molecules and proteins, which in turn form structural components of the cell or catalyse essential biochemical processes. Understanding the sequence of DNA, which is made from the four basic building blocks or ‘nucleotides’, A,G,C and T, has resulted in great insights and discoveries in cellular biology, pathology and disease, culminating in the human genome project, which achieved the remarkable feat of determining the sequence of the three billion bases of the human genome.
The field of DNA sequencing has witnessed some key milestones in technology develop – ment since the description of the first revolutionary DNA sequencing techniques in 19771,2. The Sanger dideoxy sequencing method, discovered by the Nobel Laureate Fred Sanger, underwent the most significant improvements and became the first automated sequencing platform in the late 20th century. Advancements in the Sanger process were partly motivated by the advent of the USD 3 billion Human Genome Project, which required the development of high-throughput tech – niques3,4 (Figure 1A).
Despite automation, there still remained a throughput limitation of the Sanger technique when conducting large scale sequencing projects. Recent significant advances have addressed this, resulting in the development of next-generation sequencing technology (Figure 1B).
The first NGS technology to be developed was based on the novel pyrosequencing method5 and was commercially released as the 454 sequencing platform in 20056,7. Additional platforms followed including the Solexa/ Illumina and SOLiD/Life Technologies sequencers (Figure 1B). Although differing in their chemistries and processes, the platforms have broadly similar workflows. The NGS method begins by shearing genomic or cDNA molecules into small fragments, followed by massively parallel PCR amplification and sequencing of individual DNA molecules to produce short read DNA sequences8. These short reads are then aligned by informatics methods which look for overlaps between reads to reconstruct the sequence of the starting template DNA molecule. The 454 sequencing method employs sequential enzymatic incorporation of nucleotides (Figure 2A), where incorporation releases inorganic pyrophosphate, which is subsequently converted into a chemi – luminescent signal9. This signal is detected by a charge-coupled device (CCD) camera and converted into a DNA sequence, in which the light intensity is proportional to the number of incorporated nucleotides10. In contrast, the Illumina technology is based on an alternative sequencing-by-synthesis approach, in which nucleotides have fluorescently labelled reversible terminators attached11 (Figure 2B). The reversible terminators are sequentially incorporated into the growing DNA strand, and imaged to identify the incorporated base. The terminator moiety is then removed to allow for the incorporation of the next reversible terminator. In determining which NGS tech – nology to use, important factors include cost per run, sample preparation complexity, run time, simplicity of data analysis and read lengths generated13.
The arrival of third generation sequencing technology, in which single DNA molecules are sequenced in real time, promises even faster sequencing with higher data outputs per machine run12 (Figure 1C). These platforms, which include nanopore sequencing and technologies which monitor polymerase base incorporation in real-time, could provide additional benefits, including longer read lengths, rapid run times and reduction in the amount of starting sample required.
Applications in drug discovery
The application of NGS technology within academic laboratories has been rapid and has resulted in many new exciting discoveries. Here we describe some current and potential applications of NGS technology in the drug discovery and development process (Figure 3).
FIGURE 3 Applications of Next Generation Sequencing Technologies. Listed are some of the different applications possible on a single NGS sequencing machine which span across the central dogma of molecular biology (‘DNA makes RNA makes protein’)
Whole genome DNA sequencing
One of the areas where NGS has had a large impact is in sequencing of whole genomes. Whole genome sequencing using NGS allows great depth of sequence coverage in one machine run, which substantially reduces both time and cost as compared to traditional Sanger sequencing.
Such is the throughput of the technology that whole bacterial and viral genomes can now be routinely sequenced in one experiment, for example to study the mechanisms behind drug resistance. In another application, NGS is being used to investigate genomic sequence diversity in bacterial cell populations. NGS is transforming these metagenomic studies by generating DNA sequence data from the bacterial community as a whole, which can then be used to identify the individual bacteria types present, for example within the human gut. A recent study investigated the microbiota community of the human gut and uncovered a gene set some 150 times larger than that in humans14. Metagenomic studies of case and control subjects are also allowing insight into the role of bacterial diversity in disease; for example aiding drug discovery by attempting to understand bacterial cell population diversity and its role in inflammatory bowel disease and Crohn’s disease phenotypes15.
NGS has also been successfully used in target identification for anitibacterials. In one study of Mycobacterium tuberculosis, a com – pound was identified which was effective in a whole cell assay but the mechanism of action was unknown. Following generation of resistant mutants, NGS was used to sequence both sensitive and resistant strains which led to the identification of mutations in a gene present in the resistant but not the sensitive strains. Subsequent complementation assays identified this gene as the target of the compound16.
RNA studies using RNA-Seq
Insights into disease processes and compound mechanism of action can be gained by the study of RNA; both mRNA and potentially small RNAs such as microRNA (miRNA).
Transcriptomics is the qualitative and quantitative study of RNA expression, which focuses on the differential regulation of genes. Gene expression measurements can potentially be used in disease staging, target validation, or as disease or pharmacodynamic biomarkers. NGS RNA sequencing (RNA-Seq) has enriched transcriptome studies due to its ability to generate large and detailed datasets17. In addition to accurate measurement of gene expression levels, the technology also provides high resolution information about transcript splice variation.
RNA-Seq is also capable of detecting transcripts created from gene fusion events, which have been observed in tumour cells. For example, a study by Levin et al. used RNA-Seq to sequence a tumour transcriptome. The sensitivity of sequencing with RNA-Seq enabled the detection of low abundance transcripts. As a result, novel splicing and gene fusions were reliably identified in addition to the quantification of gene expression18. This application of NGS could be used to identify such fusion transcripts with a view to developing compounds which selectively target the fusion protein product; a pharmacological approach which was taken in the development of Gleevec to target the Bcr-Abl fusion protein19.
A further application of RNA-Seq is small RNA profiling, in which known non-coding RNAs and novel RNA sequences can be detected20. Small RNA profiling encompasses a number of different classes of RNAs, which include microRNAs (miRNAs) and small interfering RNAs (siRNAs)21. Each of these non-coding RNAs is involved in the regulation of gene expression. The average length of a mature miRNA is 22 nucleotides, which makes the NGS an ideal platform to profile miRNAs due to the short read nature of the sequencing process22. Currently, 1424 human miRNA species have been discovered (miRBase release 17.0, www.mirbase.org) in a variety of tissues and bodily fluids23, including carcinoma24, embryonic stem cells25, plasma/serum, saliva, urine and blood. The numerous sources of miRNAs and their stability make them potential candidates for biomarkers. For example, Liu et al. created a miRNA profile using NGS to discover a group of five serum miRNAs that are differentially expressed in gastric cancer patients26. The discovery of such biomarkers could find application in the early diagnosis of disease26 and perhaps also in sub-grouping subjects in clinical trials based upon their miRNA profiles. Accurate miRNA profiling using NGS still has some challenges, including for example discerning closely related miRNA sequence variants called isomiRs22,25.
Applications in protein science
Although the majority of NGS applications described to date have centred on DNA and RNA, an emerging area of application is in the study of proteins. One such area is epigenetics, which is defined as the study of heritable changes in phenotype and effects on gene expression by mechanisms other than alterations in the DNA sequence. Examples include post translational modifications of histone proteins such as methylation and acetylation, or methylation of DNA bases, many of which are implicated in the control of gene expression13.
One novel application of NGS called CHIP-seq aims to monitor such changes. In the CHIP-seq process NGS chromatin immunoprecipitation (CHIP) together with large scale NGS sequencing is used to produce whole genome maps of the location of transcription factors or modified histones of interest27. ChIP-seq studies of histone modifications have also increased knowledge around gene regulation in cancer research, especially when comparing normal to tumour cells. NGS is providing a substantial amount of new data in the field of epigenetics, which may lead to the identification of new drug targets in oncology and other therapy areas28.
An intriguing new application of NGS in protein analysis is ribosome profiling. This method identifies the positions of active ribosomes that are bound to a target mRNA, providing a whole cell view of protein translation29. The NGS process captures ribosome protected mRNA, sequences the ribosome binding sites and maps the sequence reads back to the human genome, identifying the proteins which are being translated. Ingolia et al. showed how ribosome profiling in yeast can be used to determine the translational regulation of gene expression29. This method could potentially be applied to the study of tumours, as protein translation is often deregulated in these cells.
Conclusions and discussions
NGS is already finding application in many stages of the drug discovery process, from target identification through to personalised medicine and is also improving our knowledge of complex biological systems and diseases. The technology offers significant benefits when compared to conventional Sanger sequencing, due to its much lower cost-per-base and sub – stantial data sets generating billions of bases of data in a single sequencing run. The technology can carry out a range of applications all on a single platform that extend across DNA, RNA and protein studies. Finally, the emerging third generation sequencing technologies promise even larger datasets at reduced cost and the potential to sequence from small amounts of starting material which could benefit clinical studies. It is likely that DNA sequencing technologies will continue to develop at a rapid pace in the coming years, yielding novel approaches and applications in the drug discovery process.
References
1. Maxam, A.M. and Gilbert, W. (1977) A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74, 560-564
2. Sanger, F. et al. (1977) DNA sequencing with chainterminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467
3. Bhullar, B. (2010) The Sequencing Revolution: enabling personal genomics and personalised medicine. European Pharmaceutical Review 5, 49-52
4. Lander, E.S. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860-921
5. Ronaghi, M. et al. (1998) A sequencing method based on real-time pyrophosphate. Science 281, 363-365
6. Ansorge, W.J. (2009) Next-generation DNA sequencing techniques. N. Biotechnol. 25, 195-203 7. Margulies, M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376-380
8. Su, Z. et al. (2011) Next-generation sequencing and its applications in molecular diagnostics. Expert Rev. Mol. Diagn. 11, 333-343
9. Rothberg, J.M. and Leamon, J.H. (2008) The development and impact of 454 sequencing. Nat. Biotechnol. 26, 1117-1124
10. Metzker, M.L. (2010) Sequencing technologies – the next generation. Nat. Rev. Genet. 11, 31-46
11. Bentley, D.R. et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53-59
12. Schadt, E.E. et al. (2010) A window into third-generation sequencing. Hum. Mol. Genet. 19, R227-R240
13. Mardis, E.R. (2008) The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133-141
14. Qin, J. et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59-67
15. Willing, B.P. et al. (2010) A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology 139, 1844-1854
16. Andries, K. et al. (2005) A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis. Science 307, 223-227
17. Wang, Z. et al. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57-63
18. Levin, J.Z. et al. (2009) Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10, R115
19. Deininger, M.W. and Druker, B.J. (2003) Specific targeted therapy of chronic myelogenous leukemia with imatinib. Pharmacol. Rev. 55, 401-423
20. Ozsolak, F. and Milos, P.M. (2011) RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12, 87-98
21. Ghildiyal, M. and Zamore, P.D. (2009) Small silencing RNAs: an expanding universe. Nat. Rev. Genet. 10, 94-108
22. Lee, L.W. et al. (2010) Complexity of the microRNA repertoire revealed by next-generation sequencing. RNA 16, 2170-2180
23. Weber, J.A. et al. (2010) The microRNA spectrum in 12 bodily fluids. Clin. Chem. 56, 1733-1741
24. Mizuguchi, Y. et al. (2011) Sequencing and bioinformatics-based analyses of the microRNA transcriptome in hepatitis B-related hepatocellular carcinoma. PLoS One 6, e15304
25. Morin, R.D. et al. (2008) Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res. 18, 610-621
26. Liu, R. et al. (2011) A five-microRNA signature identified from genome-wide serum microRNA expression profiling serves as a fingerprint for gastric cancer diagnosis. Eur. J. Cancer 47, 784-791
27. Park, P.J. (2009) ChIP–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680
28. Neff, T. and Armstrong, S.A. (2009) Chromatin maps, histone modifications and leukemia. Leukemia 23, 1243-1251
29. Ingolia, N.T. et al. (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218-223
This website uses cookies to enable, optimise and analyse site operations, as well as to provide personalised content and allow you to connect to social media. By clicking "I agree" you consent to the use of cookies for non-essential functions and the related processing of personal data. You can adjust your cookie and associated data processing preferences at any time via our "Cookie Settings". Please view our Cookie Policy to learn more about the use of cookies on our website.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorised as ”Necessary” are stored on your browser as they are as essential for the working of basic functionalities of the website. For our other types of cookies “Advertising & Targeting”, “Analytics” and “Performance”, these help us analyse and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these different types of cookies. But opting out of some of these cookies may have an effect on your browsing experience. You can adjust the available sliders to ‘Enabled’ or ‘Disabled’, then click ‘Save and Accept’. View our Cookie Policy page.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Cookie
Description
cookielawinfo-checkbox-advertising-targeting
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID
This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged
This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.
Performance cookies are includes cookies that deliver enhanced functionalities of the website, such as caching. These cookies do not store any personal information.
Cookie
Description
cf_ob_info
This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob
This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only
This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush
This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db
This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC
This cookie is set by Youtube and is used to track the views of embedded videos.
Analytics cookies collect information about your use of the content, and in combination with previously collected information, are used to measure, understand, and report on your usage of this website.
Cookie
Description
bcookie
This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS
This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang
This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc
This cookie is set by LinkedIn and used for routing.
lissc
This cookie is set by LinkedIn share Buttons and ad tags.
vuid
We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId
This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule
This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session
This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues
This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga
This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat
This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid
This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
Advertising and targeting cookies help us provide our visitors with relevant ads and marketing campaigns.
Cookie
Description
advanced_ads_browser_width
This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions
This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info
This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer
This cookie is set by Advanced Ads and sets the referrer URL.
bscookie
This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE
This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr
This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory
This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE
This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.