Detecting microRNA targets or siRNA off-targets using expression data

Posted: 3 December 2008 | Anton J. Enright, Group Leader, EMBL – European Bioinformatics Institute | No comments yet

Recently, small RNAs such as microRNAs (miRNAs) have been demonstrated to be important regulators in both plants and animals. In animals miRNAs act as translational repressors of target genes through a combination of inhibition of translation and mRNA destabilisation. These molecules have been implicated in a multitude of diseases, including cancer and represent promising candidates for both diagnostics and therapeutics. While substantial progress has been made in the detection, sequencing and profiling of miRNAs, accurately delineating their targets remains difficult. Purely computational approaches hold much promise, yet they still suffer from over-prediction. In this article we will describe alternative approaches that utilise computational analysis combined with gene expression data to better detect miRNA effects and their targets. In particular we will describe Sylamer1 a new tool for the detection of miRNA targets and siRNA off-target effects from expression data.

Recently, small RNAs such as microRNAs (miRNAs) have been demonstrated to be important regulators in both plants and animals. In animals miRNAs act as translational repressors of target genes through a combination of inhibition of translation and mRNA destabilisation. These molecules have been implicated in a multitude of diseases, including cancer and represent promising candidates for both diagnostics and therapeutics. While substantial progress has been made in the detection, sequencing and profiling of miRNAs, accurately delineating their targets remains difficult. Purely computational approaches hold much promise, yet they still suffer from over-prediction. In this article we will describe alternative approaches that utilise computational analysis combined with gene expression data to better detect miRNA effects and their targets. In particular we will describe Sylamer₁ a new tool for the detection of miRNA targets and siRNA off-target effects from expression data.

Reserve your FREE place

Address the time-to-result challenge posed by short shelf-life radiopharmaceuticals.

20 November 2025 | 3:00 PM GMT | FREE Virtual Panel Discussion

This webinar showcases the Growth Direct System; an RMM (Rapid Microbial Method) that improves on traditional membrane filtration, delivering increased accuracy, a faster time to result, enhanced data integrity compliance, and more control over the manufacturing process.

Key learning points:

Understand the benefits of full workflow microbiology quality control testing automation in radiopharmaceutical production
Learn about ITM’s implementation journey and considerations when evaluating the technology
Find out how the advanced optics and microcolony detection capabilities of Growth Direct® technology impact time to result (TTR).

Don’t miss your chance to learn from experts in the industry – Register for FREE

Currently there are 695 confirmed miRNAs in Human (miRBase 12)₂. One expects miRNAs to have multiple targets. However few miRNA targets have been experimentally confirmed so far. Currently, no accurate high-throughput experimental approaches exist for accurately determining miRNA target binding. Clearly, purely computational approaches are promising but while they have been shown to have high-sensitivity they can suffer from over-prediction issues₃. The key issue faced by computational approaches is that miRNAs are short (21nt) and that the key region for binding specificity is even shorter (6-8nt). Finding complementary binding sites in the 3’UTRs of potential target transcripts is hence daunting as one can find 6nt complementary sites for any miRNA across the entire genome randomly at reasonably high frequencies.

Target prediction

Some computational tools use additional filters to aid in the process of deciding which complementary sites are real and which are likely noise. Such filters include conservation of the site across species, potential thermodynamic energy, positional constraints within the 3’UTR and statistical models. These extra filters have indeed helped4, but still fall short of the mark. Furthermore, some of these filters (e.g. conservation) may be increasing specificity at the expense of sensitivity as it has been shown that some miRNAs have target sets that are not highly conserved5. It is possible that many binding sites predicted by such methods are feasible sites but that the miRNA and its predicted target are never in the same place at the same time. It seems clear that extra information derived experimentally can aid the process of target discovery.

The effect of microRNAs on mRNA expression levels

Initially, it was thought that miRNAs primarily operated by translational silencing and that the action of a miRNA would only be evident at the protein level. However, an experiment by Lee Lim and others at Rosseta Inpharmatics was instrumental in providing the first evidence that the action of miRNAs could also be detected at the mRNA level6. Their work showed that miRNAs introduced into HeLa cells had strong effects on mRNA levels and that the transcripts whose expressions were decreasing were strikingly enriched in potential seed matches to the introduced miRNA. Subsequent experiments demonstrated that miRNAs binding to their targets in 3’UTRs appear to stimulate both deadenylation and decapping which in turn marks the target transcript for degredation5. Recent studies combining both proteomic analysis and mRNA expression following miRNA perturbation do show cases where protein levels change but mRNA levels remain relatively static7,8. However, it would seem that in most cases significant shifts were observed at both the protein and mRNA levels.

The fact that introducing or removing a miRNA from a system of interest causes measurable mRNA and protein level changes creates a new way of probing miRNA targets. In the simplest case one can imagine comparing wild-type cells to cells where a miRNA is being over-expressed. One can then compare gene-expression profiles of these two cell types working under the assumption that increased levels of the miRNA will stimulate greater repression of its target genes. The expression levels of these putative target genes would decrease significantly and be detected according to fold-change.

Such observations have been previously used to predict miRNA targets in a number of systems. In an analysis of early Zebra fish development a single miRNA (miR-430) was reinjected into mutant embryos5. Expression profiles were taken from mutant embryos and embryos which had also been injected. A large number of mRNAs showed significant expression decreases following injection of miR-430. Of a total of 27 candidate target mRNAs from a total of 30, validated as being direct miRNA targets using GFP reporter assays. A similar study comparing T-Helper (Th1) cells from wildtype mice versus DmiR-155 (bic) mutant mice9. In this case those genes whose mRNA expression levels increased significantly in the mutant were identified as likely miR-155 target genes of which a number were subsequently validated using a luciferase reporter assay.

An experimental paradigm

The experimental paradigm for such studies is straightforward (see Figure 1).

Firstly it is useful to profile the system of interest to determine which miRNAs are expressed or changing significantly. Secondly a miRNA of interest can be perturbed using for example, a knock-out or transfecting in an antisense molecule to bind to the miRNA of interest and prevent if from functioning (e.g. Antigomir, 2’O-methyl or LNA). Subsequently, mRNA expression profile or proteomics analysis is used to obtain a readout of the effect of the perturbation. Finally, computational analysis of the expression data will determine whether there is a primary effect, how significant the effect is and also the candidate target genes involved.

Establish which miRNAs are important:

miRNA profiling
new technology sequencing.

Perturb miRNAs in the system:

Antisense knock-down
Knock-out mouse model
Knock-in transfection of double stranded miRNA analogue
Over-expression vector.

Profile mRNA or protein levels:

Gene expression
Proteomics.

Analysis and target prediction:

Differential expression analysis (e.g. t-test)
Sylamer.

Computational and Statistical Analysis

A question remains for this type of analysis: Are the genes, whose expression levels are changing, direct targets bound by the miRNA or indirect secondary effects? One relatively straightforward way to answer this question is to look at the presence or absence of complementary miRNA seed matches in the genes that are changing. If these genes are real direct targets of the miRNA then one would expect them to possess seed matches to the miRNA. Analysis of the frequency and significance of such seed matches in the 3’UTRs of genes that have changed can hence allow one to determine whether the effect is significant and identify the subset of genes most likely to be direct targets. However, questions remain about what threshold to use when selecting a genelist for such analysis. Gene-set enrichment analysis tools such as GSEA10 have recently been shown to be useful for analysis of over-represented terms or annotations in gene lists. Instead of using a single cutoff and thus a single genelist, Gene Set Enrichment Analysis (GSEA)10 uses the full list of genes, ranked according to how much they change in an experiment. This approach removes the need of imposing arbitrary cutoffs, instead searching for coordinated shifts in complete pathways or gene sets of biological interest, even if many individual genes might not lie at the top of the ranked genelist10.

This type of analysis can be extended to the case of finding words that are complementary to seed-regions of miRNAs or siRNAs in the 3’UTRs of genes whose expression has changed following a perturbation experiment11. Hence, if enrichment of such words correlates with the rankings of 3’UTRs of genes whose expression has changed during a miRNA experiment, part of the expression changes can be attributed to direct effects. This approach has been validated on numerous datasets and shows that particular miRNAs have major effects on tissue or developmental expression profiles, where that miRNA is removed or reinjected. Similarly, RNA interference (RNAi) experiments can be assessed to determine whether gene-expression changes resulting from knockdown are likely due to a primary effect or secondary, miRNA-like, off-target effects12. Although tools exist for discovering enriched word motifs in sequences, many do not deal with ranked sequences or cannot be directly applied to the problem of miRNA seed analysis. Recently we demonstrated a new method Sylamer1 for analysis of miRNA binding in expression data. The method is both powerful and also extremely fast, making it ideal for genome-wide datasets.

Sylamer

The Sylamer algorithm1 takes a list of genes with their 3’UTRs ranked from up-regulated to down-regulated following an miRNA or RNAi experiment. Seed matches to miRNA computed together with associated hypergeometric P-values of binding sites in 3’UTRs that have perfect complementarities to the 5′ end (seed) of a miRNA or siRNA. This is performed across nested leading bins of the ranked sequences, analogous to Gene Set Enrichment Analysis. The output is used to produce an intuitive landscape plot that tracks occurrence biases for all seeds across the gene ranking. This enables verification of the hypothesis that miRNAs or siRNAs are directly affecting expression, while also identifying the fraction of genes changing due to this effect. Unlike previous approaches, the method is fast enough to allow genome-wide analysis of all known miRNA seeds in large-scale experiments. Analysis of all known miRNA seeds for a human genome-wide experiment takes less than a minute. Below we demonstrate the utility and accuracy of this type of approach on several example miRNA and siRNA datasets.

Examples

In order to determine the effectiveness of our approach for the detection of enriched/depleted miRNA binding signals we applied it to two published datasets. The first dataset derives from a mouse knockout model of miR-155 (bic)9. In this case gene expression data was obtained for T-helper (Th1) cells from both knockout and wild-type animals. Each gene on the array (for which a 3’UTR was available) was ranked from most up-regulated to most down-regulated according to fold-change t-statistic. Our goal is to reliably determine whether the greatest contributions to gene-expression changes are direct effects resulting from absence of miR-155 in the knockout (i.e. loss of miR-155 mediated repression). The sorted genelist and associated 3’UTR sequences were supplied to Sylamer. The resulting enrichment analysis plot (see Figure 2a) clearly shows that most words drift randomly without showing any significance.

A strong signal is however evident for 6 (P ≤ 1×10-41), 7 (P ≤ 1×10-36) and 8nt (P ≤ 1×10-25) words corresponding to the seed-region of miR-155, peaking at ≈500 genes. This indicates that these most up-regulated genes are enriched in potential miR-155 binding sites and that their observed over-expression is likely due to the absence of miR-155 in the knockout sample.

In another example we take gene-expression data from maternal zygotic Dicer mutant (MZ-Dicer) Zebrafish embryos5,13. Here we aim to assess the role of an early developmental miRNA by comparing mutant fish against mutant fish injected with synthetic miR-430. The mutant fish cannot produce significant quantities of functional miRNAs as the Dicer enzyme (required for mature miRNA excision), is non-functional13. In this case the perturbation involves a miRNA being reintroduced to a system where miRNAs are not present. If miR-430 is significantly affecting gene-expression we expect the effect to be most evident in down-regulated genes (i.e. gain of miR-430 mediated repression). The resulting enrichment plots obtained using Sylamer (see Figure 2b) show that most words exhibit no significant enrichment or depletion across the genelist with the exception of those words directly corresponding to the seed region of miR-430.

As expected, this signal is observed in the down-regulated section of the genelist (P ≤ 1×10-26 at 6nt). This reconfirms the hypothesis that injection of miR-430 leads to direct repression of its set of target transcripts and yields a set of genes likely to be highly enriched in real miR-430 targets and excellent candidates for further validation5.

Application to RNAi experiment expression data

RNA interference (RNAi) is an increasingly common approach to study the effect of knocking-down a particular gene of interest. Frequently, gene-expression studies are undertaken after a gene has been knocked-down in order to determine the effect of RNAi knock-down of the primary target on mRNA expression levels and to identify possible downstream pathways and regulatory targets. However, it has been shown that many off-target effects observed in RNA interference experiments may be due to siRNAs acting as miRNAs on unintended genes12. This can create serious issues for genome-wide screens as designed siRNAs may be unintentionally affecting the expression of tens or even hundreds of genes. In the context of assessing expression data from an RNAi experiment this type of approach can be used to assess whether miRNA-like effects are present. In these cases one wants to see little or no enrichment or depletion of words complementary to the siRNA and any gene-expression changes observed are most likely secondary effects following successful knockdown of the intended target gene. Conversely, if an siRNA is binding other transcripts (off-targets), we expect to observe specific enrichment of complementary words to the 5’ end of that siRNA in down-regulated genes. The size and extent of any observed enrichment may also be used to evaluate how serious this effect is. Of course, smart-pooling of multiple siRNAs to a target gene should alleviate this effect, however this type of analysis could still be useful for validating large-scale screens.

A previous study used microarrays to measure the effects of transfecting different siRNAs into HeLa cells12. Using these data we can produce, for each transfection experiment, a genelist ranked according to fold-change starting with the most down-regulated genes (likely to be direct off-targets). In the first example (see Figure 2c) the siRNA does not seem to exhibit off-target effects as no particular sequences are enriched or depleted and expression effects observed are likely direct.

However in the second example a significant enrichment of words matching the 5′ end of the siRNA (see Figure 2d) is observed.

It can be seen that the effect on the expression profile is due to a miRNA-like effect, since the only significant words are those that match to the beginning of the siRNA. The use of Sylamer in these cases can help to identify screens which have worked as planned and to flag those screens where significant miRNA like off-target effects are observed.

Discussion

The examples shown above will hopefully illustrate the power of using enrichment analysis to detect miRNA seed sequences in genelists. Although not explicitly designed for siRNA analysis, we believe such approaches may also be useful for validating hits in large-scale siRNA hits. This approach allows one to determine rapidly whether an miRNA like effect is observed, to quantify the extent of the effect and to isolate the likely set of genes involved. The examples shown above utilise mRNA expression profiling as a readout of the miRNA perturbation although proteomics data could be also be used as long as it can produce a ranked ordered genelist of protein levels. It is not strictly required to directly perturb miRNAs, although this type of experiment typically gives the best results. One might also obtain reasonable results comparing wild-type cells to cancerous cells for example. The Sylamer software described in this article is freely available from http://www.ebi.ac.uk/enright/sylamer.

References

van Dongen, S., Abreu-Goodger, C. & Enright, A.J. Detecting microRNA binding and siRNA off-target effects from expression data. Nature methods (2008).
Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for microRNA genomics. Nucleic acids research 36, D154-158 (2008).
Sethupathy, P., Megraw, M. & Hatzigeorgiou, A.G. A guide through present computational approaches for the identification of mammalian microRNA targets. Nature methods 3, 881-886 (2006).
Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105 (2007).
Giraldez, A.J. et al. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75-79 (2006).
Lim, L.P. et al. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-773 (2005).
Baek, D. et al. The impact of microRNAs on protein output. Nature 455, 64-71 (2008).
Selbach, M. et al. Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63 (2008).
Rodriguez, A. et al. Requirement of bic/microRNA-155 for normal immune function. Science 316, 608-611 (2007).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
Farh, K.K. et al. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005).
Birmingham, A. et al. 3′ UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3, 199-204 (2006).
Giraldez, A.J. et al. MicroRNAs regulate brain morphogenesis in zebrafish. Science 308, 833-838 (2005).

Issue

Issue 6 2008, Past issues

Related organisations

European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL)

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended

Detecting microRNA targets or siRNA off-targets using expression data

Address the time-to-result challenge posed by short shelf-life radiopharmaceuticals.

Target prediction

The effect of microRNAs on mRNA expression levels

An experimental paradigm

Sylamer

Examples

Application to RNAi experiment expression data

Discussion

References

Issue

Related topics

Related organisations

Recommended

Detecting microRNA targets or siRNA off-targets using expression data

Address the time-to-result challenge posed by short shelf-life radiopharmaceuticals.

Target prediction

The effect of microRNAs on mRNA expression levels

An experimental paradigm

Sylamer

Examples

Application to RNAi experiment expression data

Discussion

References

Issue

Related topics

Related organisations

Leave a Reply Cancel reply