1 Université Pierre et Marie Curie, Laboratoire de Biologie Computationnelle et Quantitative, 15 rue de l'Ecole de Médecine, 75006 Paris, France.2 CNRS, UMR7238, 75006 Paris, France.3 Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.4 Institut Universitaire de France, 75005, Paris, France.
E-mail: Anthony.Mathelier@gmail.com and Alessandra.Carbone@lip6.fr.
Deregulation of gene expression is one of the main characteristics of cancer
cells. The implication of microRNAs (miRNAs), a class of small non-coding RNAs
implicated in post-transcriptional regulation of gene expression, in this
process have rapidly become evident. As protein-coding genes, miRNAs can act as
tumor suppressors or oncogenes (we speak about oncomirs). Recent studies have
highlighted the paralogous clusters of miRNAs mir-17-92 and mir-106a-363 to be
involved in carcinogenesis. It features the importance of a local and
structural organization of miRNAs with potential impact on cancers. We
performed computational predictions of structural clusters of miRNAs sharing
the same characteristics as the two previously described clusters at the human
genome scale. We show a functional organization of miRNAs in structural clusters where the predicted miRNA targets are enriched for cancer associated genes.
On the other hand, we also show co-localization of structural clusters of miRNAs with genes involved in signaling pathways known to be disrupted in cancer. Taken together, the results provide new insights into the organization of miRNAs in the human genome along with their potential impact on carcinogenesis.
MicroRNAs (miRNAs) are a class of endogenous ∼18-25nt-long RNAs involved in post-transcriptional regulation in animals and plants. They play crucial functions in cell physiology and ensure plethora of key cellular processes by negatively regulating expression of target genes.
While computational software usually predict miRNAs on a one to one basis, recent studies have shown that they can be organized into clusters, sharing similar biological functions. Indeed, miRNAs can group together along the human genome to form stable secondary structures made of several hairpins hosting miRNAs in their stems. Alignment of miRNA sequences lying within the same cluster or in different clusters revealed a significant number of miRNA paralogs shared among and within clusters, implying an evolution process targeting the potentially conserved roles of these molecules.
In this report, we define structural clusters of miRNAs as genomic regions typically smaller than 1-2 kb and which folds into a secondary structure presenting several hairpins, where miRNAs are located on their stems. The known miRNA clusters mir-17-92 (Hayashita et al., 2005) and mir-106a-363 (Landais et al., 2007) satisfy such structural conditions, and we looked for others having the same characteristics in the human genome. Notice that miRNA clusters mir-17-92 and mir-106a-363 are known to play a role in human tumour development.
We report here the results described in Mathelier and Carbone, 2013 by focusing on the potential involvement of structural clusters of miRNAs in cancer. Using newly developed computational methods, we predicted structural clusters of miRNAs at the human genome scale. The predictions were made using three different approaches: (1) finding structural clusters of paralogous miRNAs from the genomic sequence only, (2) constructing structural clusters of miRNAs predicted using small RNA-seq (sRNA-seq) data, (3) combining paralogous miRNAs prediction and sRNA-seq data. The precursors of miRNAs (pre-miRNAs) have been predicted using the MIReNA tool (Mathelier and Carbone, 2010), which has been described as a first-choice when predicting new miRNAs in mammals (Li et al., 2012). Predictions were validated a posteriori as bona fide miRNAs by analysing expected characteristics of miRNAs and pre-miRNAs. We highlighted that structural clusters of miRNAs co-localized with genes related to signal transduction pathways, known to be involved in carcinogenesis. Finally, a functional analysis of potential targets of the predicted miRNAs confirms a regulatory role of most predicted miRNAs in structural clusters and highlights their potential involvement in cancer.
I. Overview of predicted miRNA structural clusters
The computational tool developed in Mathelier and Carbone, 2013 predicts structural clusters either by looking for repeated sequences in palindromic regions, by using deep-sequencing reads as potential miRNAs forming structural clusters or by combining the two kinds of information (see Methods section of Mathelier and Carbone, 2013 for details). While the first strategy is based on the human genomic sequence to make predictions, the two other strategies are using data coming from the small
RNA-sequencing technology. We want to stress here that the sRNA-seq data sets used are mainly derived from cancer cells. Namely, breast cancer cells, melanoma and pigment cells, and cervical cancer cells were used for predicting structural clusters of miRNAs. As the computational method development is based on the characteristics of the two already known miRNA structural clusters, the newly predicted structural clusters display similar characteristics to the known mir-17-92 and mir-106a-363 clusters.
Figure 1: Structural clusters and structural cluster regions. A: structural cluster, known as mir-17-92, predicted on human chromosome 13 from deep-sequencing data. miRNAs validated by the algorithm are highlighted in red. This structural cluster was filtered with RepeatMasker. B: Example of a structural cluster region, where two structural clusters (red) are located along the chromosome (dark grey) together with their targets. Protein-coding genes are represented in green. miRNAs occurring in a structural cluster target genes that are located in the same structural cluster region, as indicated by yellow links. We recall that a structural cluster region asks to a structural cluster in the region to be spaced no more than δkb from at least another structural cluster in the region, where δ typically corresponds to several hundreds. Each region is defined by adding δkb at each end, from the structural clusters at the extremes of the region. The shaded areas (light grey) below the chromosome help to highlight these conditions.
When applied to the human genome, we predicted 416 structural clusters containing 1713 miRNAs (see Figure 1 for examples). To validate a posteriori the predicted miRNAs as bona fide miRNAs, we highlighted that 70% of miRNA predictions based on either sequence analysis or deep-sequencing data contain seeds (i.e. subsequences corresponding to positions 2-8 in the miRNA) identical to known miRNAs (see Table 1). The presence of already identified seeds in miRNAs of structural clusters increases the level of confidence in the predictive approach as they represent a signature of target prediction (Lewis et al., 2003; Lewis et al., 2007). Furthermore, a very large fraction of structural cluster sequences predicted from deep-sequencing data, and 10% of those predicted using paralogs contain at least one known miRNA sequence (100% identity is asked) from miRBase; many of these miRNAs being human miRNAs.
When considering predictions obtained from sRNA-seq data, we discovered 12 structural clusters containing miRNAs that are all mapped by reads coming from the same deep-sequencing experiment: eight structural clusters belong to a data set from cervical cancer cells and the others to melanoma and pigment cells. Predictions were made by combining all sRNA-seq data sets together. Note that the known mir-17-92 and mir-106a-363 structural clusters could only be predicted by combining sRNA-seq reads coming from multiple experiments. Indeed, miRNAs hosted in the stems of the structural clusters appeared to overlap with reads from specific experiments. Namely, four predicted miRNAs over five in mir-17-92 and four over six in mir-106a-363 come from the same experiment. It highlights that even though miRNAs are organized in clusters, their expression can be cell-type/tissue specific. This evidence supports search criteria that mix together reads coming from different experiments.
Table 1: Structural cluster predictions on human chromosomes. Predictions are realized with the three paths of the algorithm, respectively based on : paralogous sequences (paral), deep sequencing reads (deep) and a combination of the two kinds of data (comb). The total number of predicted structural clusters (SCs; total), the number of predicted SCs lying in intronic regions (intron), and the number of predicted SCs lying in intergenic regions (inter) are reported for each method. The number of known miRNAs (with 100% sequence identity) occurring in predicted SC sequences and the number of SCs containing at least one predicted miRNA with same seed as in known miRNAs are also reported (last two columns). (Recall that two miRNAs have the same seed if their nucleotides at positions 2-8 are the same.) The number of known miRNAs is computed on the miRBase dataset. The number of known human miRNAs is given in parenthesis. The total number of predictions obtained by the three methods is reported in the last line. Identical predictions (see Methods in Mathelier and Carbone, 2013) are counted once. Note that the table is derived from the original predictions made in Mathelier and Carbone, 2013 by using the hg18 version of the human genome.
Our predictions highlight that most of the 99 predicted structural clusters are only partially processed under specific conditions. Indeed, 75 of the 99 predicted structural clusters obtained from sRNA-seq contain at least two miRNAs coming from the same experimental data set, 22 contain at least three, six contain at least four and one contain at least five.
II. miRNA structural clusters co-localize with genes associated to specific signal transduction pathways involved in carcinogenesis
We studied whether structural clusters of miRNAs were functionally organized along the human chromosomes. We hypothesized that genes sharing similar biological functions with miRNA clusters might co-localize along the human genome, allowing for an improved transcriptional effciency. To evaluate this hypothesis, we analyzed larger and larger regions (of a few million bases) around the predicted structural clusters and counted the amount of genes, in the regions, involved in specific biological pathways. All subclasses and pathways from all KEGG's pathways have been analyzed. We showed that groups of genes involved in specific biological pathways and structural clusters of miRNAs co-localize in a statistically significant manner along the human chromosomes. For instance, immune system diseases and sensory system subclasses are highlighted.
We further explored genes involved in signal transduction pathways as they have been recurrently shown to be involved in cancer development and have been defined as preferential targets for cancer therapy (Bode and Dong, 2005; Reddy and Couvreur, 2010). While taken all together, the signal transduction pathways are not significantly co-localizing with structural clusters. Nevertheless, some specific pathways are enriched in structural cluster regions. Namely, the Wnt (150 genes, P =0.015), Notch (47 genes, P =0.017), and Hedgehog (56 genes, P =0.048) pathways display a non-random gene distribution in structural cluster regions (see Figure 2). These signal transduction pathways were previously pinpointed as prime candidates for miRNA-mediated regulation, and several examples were reported to suggest miRNAs to be generators of graded responses or amplifiers in signal pathways, both for single pathways or signalling cross-talks (Inui et al., 2010). Notice that previous works highlighted the role of the Wnt (see Polakis, 2000; Anastas and Moon, 2013 for reviews), Notch (Hu et al., 2012; Al-Hussaini et al., 2012; Lobry et al., 2011), and Hedgehog (Amakye et al., 2013; Gupta et al., 2010) pathways in cancer development and all three pathways have been described as preferred therapeutic targets for cancer treatment.
Figure 2: Curves relating coverage of structural cluster regions to genes belonging to specific KEGG's pathways: Wnt, Notch and Hedgehog. The curve generated by randomly selecting genes is also plotted. Similar curves are reported in the Supplementary Material of Mathelier and Carbone, 2013 for all statistically significant analysis of KEGG's subclasses.
III. miRNAs in structural clusters target cancer genes
We further our functional analysis of miRNAs contained in computed structural clusters by predicting their potential targets. Using the miRanda (John et al., 2004) and PITA (Kertesz et al., 2007) tools, we predicted targets for all the miRNAs computed in the structural clusters in 3'UTR and CDS of human genes (see Methods of Mathelier and Carbone, 2013 for details). The following results have been obtained by using predictions from miRanda but very similar results are derived from PITA's predictions.
A Gene Ontology (GO) functional enrichment analysis of the genes targeted shows that almost half (43%) of the biological processes (BP) GO terms associated to miRNA/3'UTR pairs are involved in regulation and have the motif 'regulation of' in their name (see Mathelier and Carbone, 2013). It suggests that predicted miRNAs might be involved in the degradation of transcription factors, as it is the case for the two already known miRNAs of chromosome 13 (mir-17-92 structural cluster) regulating protein E2F1 and being regulated by c-Myc that also regulates E2F1 (O'Donnell et al., 2005).
When considering KEGG pathways, one of the pathways with the most enriched gene set in miRNA targets is 'pathways in cancer' (P< 5.4e-9). Moreover, we observe 14 pathways corresponding to different types of cancer ranked as statistically significant among all KEGG pathways (see Figure 3A and Oxford Journals). Notice that the 'melanogenesis' (P< 1.1e-5) and the 'melanoma' (P< 2.8e-5) pathways are highlighted. These specific result needs to be considered in the context of the set of sRNA-seq data sets that we used with several skin cell derived data. Moreover, 115 miRNAs (20%) targeting 3'UTRs associated to 'melanogenesis' and 'melanoma' over a total of 550 are contained in structural clusters identified using deep sequencing data derived from corresponding data sets. The enrichment for targets involved in melanogenesis and melanoma associated to miRNAs predicted from skin sRNA-seq data reinforces our confidence in predicting structural clusters of bona fide miRNAs using deep sequencing reads. Finally, we observed significant enrichment for pathways already known to be involved in carcinogenesis like apoptosis or important signalling pathways such as P53, Wnt, MAPK, Hedgehog, mTOR, VEGF, and Notch (see Oxford Journals). Notice the enrichment of signaling pathways which have been previously highlighted as co-localizing with miRNA structural clusters (see Figure 3B), the subsequent results show that they are also potential targets of these miRNAs.
Figure 3: KEGG pathways containing genes whose 3'UTRs are targeted by some predicted miRNAs. Functional analysis is realized on a set of miRNA/3'UTR pairs (see Mathelier and Carbone, 2013). A. KEGG pathways showing a Benjamini corrected p-value < 5e-4 are drawn. B. KEGG pathways showing a Benjamini corrected p-value < 0.05 and associated to genes co-localizing with structural clusters (p-value < 0.05, see previous section) are drawn. A-B. Each node of the graph represents an enriched KEGG pathway where the size of the node is proportional to the number of targeted genes (from 45 to 237 genes in A and from 37 to 188 genes in B). The higher the opacity of the nodes, the lower the associated p-value. An edge between two nodes indicates that the pathways are sharing genes. The larger the width of the edge, the larger the number of shared genes (from 3 to 127 in A and from 4 to 55 in B). Cancer KEGG pathways are painted in red, signaling pathways are painted in green, and other pathways are painted in grey. See Mathelier and Carbone, 2013 for a full table reporting also GO terms and PIR keywords Swiss-Prot Database analyses.
Using a more stringent set of predicted targets (by lowering the threshold on miRNA/target predictions, (see Mathelier and Carbone, 2013 for details), we observe a stronger signal with BP GO terms associated to 'regulation of' covering >48% of terms with P< 0.01 and >45% with P< 0.1 (see Oxford Journals). By considering KEGG pathways, 12 pathways corresponding to different types of cancers are highlighted again as statistically significant, and this confirms the implication of predicted miRNAs in cancer development (see Oxford Journals). The pathways 'melanogenesis (P< 2.4e-4) and 'melanoma' (P< 0.017) as well as several signalling pathways are among the identified ones, as already pointed out in the functional analysis of structural clusters regions.
Table 2: Functional analysis of structural clusters. For each pathway, the number of mRNAs in the pathway containing at least one target and the number of miRNAs with at least one target in these mRNAs are reported in the last two columns. For the set of miRNAs targeting genes associated to a specific pathway, we report the number of structural clusters (SCs) containing at least one of the miRNAs in the set (2nd column), the number of SCs with all their miRNAs in the set (3rd column), and the ratio of these two numbers (4th column). Pathways with a ratio ≥ 30% (blue) and ≥ 40% (green) are highlighted. In the 2nd and 3rd columns, the numbers in parenthesis correspond to structural clusters predicted with deep sequencing data. Pathways correspond to those in Figure 3A. See Mathelier and Carbone, 2013 for a full table reporting also GO terms and PIR keywords Swiss-Prot Database analyses.
It has been previously shown that miRNAs can also target genes by binding to CDS regions and not only to 3'UTRs (Tay et al., 2008). Looking for target predictions within CDSs, the analysis confirmed the observations already pointed out for 3'UTR targets and highlighted the same statistically significant terms on different data sets. From KEGG's pathways, we observe 'pathways in cancer' as the first highlighted term followed by specific cancers, signalling pathways and several cardiomyopathies (see Oxford Journals).
It is important to highlight that the miRNA/targets predictions ask for a high miRNA/target binding energy and that we observe a tendency for miRNAs targeting genes from the same functional class to be localized in the same structural clusters (see Table 2). For instance, when considering structural clusters with at least one miRNA targeting genes in KEGG 'pathways in cancer' we observe that for ∼ 50% of these clusters, all the miRNAs are targeting genes in these pathways. Furthermore, the functional analysis on miRNAs predicted by paralogous sequences and by deep-sequencing data taken separately, provides comparable results to those described earlier in the text. It shows that miRNAs predicted from a specific methodology are not biasing the functional target analysis. Finally, among targets obtained from deep-sequencing data, we observe a stronger signal of pathways in 'cancer' and 'melanogenesis' obtained for the KEGGs data set in agreement with the usage of reads coming from melanoma cancer cells.
The discovery of structural clusters mir-17-92 and mir-106a-363 involved in cancer development provided the need for a computational tool that helps to characterize potential structural clusters within the human chromosomes as new candidates for experimental tests. In Mathelier and Carbone, 2013, we predicted structural cluster of miRNAs.
Predictions were made following three different methodologies: (1) using paralogous miRNAs derived from genomic sequence analysis, (2) predicting miRNAs/pre-miRNAs using sRNA-seq data, and (3) combining the two previous methods. Predictions of structural clusters based on deep sequencing data are showing that 86% of them contain miRNAs with known seeds. We highlighted 13 new structural clusters whose miRNAs are identified by reads occurring either in cervical cancer or in melanoma and pigment cells experiments.
We showed that structural cluster regions (i.e. genomic regions surrounding structural clusters) are enriched for genes involved in cancer-related pathways such as the Wnt, Hedgehog, and Notch pathways. A target genes functional analysis strongly supports a regulatory role of most predicted miRNAs and, notably, a strong involvement of predicted miRNAs in the regulation of cancer pathways. Our findings highlight miRNA regulation mechanisms (potentially affected by mutation) as potential causes of signalling pathway disfunctioning. Experiments for testing the in silico gene target identification are required.
The predictions of structural clusters miRNAs were originally made using the hg18 version of the human genome. The localization of the predicted miRNAs/pre-miRNAs have been lifted over to the hg38 version of the human genome and can be found as .bed files at UPMC.