Regulatory networks and 5′ partner usage of miRNA host gene fusions in breast cancer

Abstract Genomic rearrangements in cancer cells can create gene fusions where the juxtaposition of two different genes leads to the production of chimeric proteins or altered gene expression through promoter‐swapping. We have previously shown that fusion transcripts involving microRNA (miRNA) host genes contribute to deregulation of miRNA expression regardless of the protein‐coding potential of these transcripts. Many different genes can also be used as 5′ partners by a miRNA host gene in what we named recurrent miRNA‐convergent fusions. Here, we have explored the properties of 5′ partners in fusion transcripts that involve miRNA hosts in breast tumours from The Cancer Genome Atlas (TCGA). We hypothesised that firstly, 5′ partner genes should belong to pathways and transcriptional programmes that reflect the tumour phenotype and secondly, there should be a selection for fusion events that shape miRNA expression to benefit the tumour cell through the known hallmarks of cancer. We found that the set of 5′ partners in miRNA host fusions is non‐random, with overrepresentation of highly expressed genes in pathways active in cancer including epithelial‐to‐mesenchymal transition, translational regulation and oestrogen signalling. Furthermore, many miRNAs were upregulated in samples with host gene fusions, including established oncogenic miRNAs such as mir‐21 and the mir‐106b~mir‐93~mir‐25 cluster. To the list of mechanisms for deregulation of miRNA expression, we have added fusion transcripts that change the promoter region. We propose that this adds material for genetic selection and tumour evolution in cancer cells and that miRNA host fusions can act as tumour ‘drivers’.


What's new?
Fusion transcripts have been identified in many tumour types, but it remains unclear how many of them represent functional tumour drivers versus passenger events. Among several mechanisms causing deregulation of miRNA expression in cancer, genomic rearrangements can create microRNA host gene fusions that alter microRNA expression regardless of the protein-coding potential of the fusion transcript. The authors show that the 5 0 partners of host fusions are highly-expressed genes in subtype-specific pathways active in breast cancer. Host fusions can thereby provide material for genetic selection and tumour evolution in cancer cells, with miRNA host fusions potentially acting as tumour drivers. We have previously shown that the host genes of intronically encoded small noncoding RNAs including both microRNA (miRNA) and small nucleolar RNA (snoRNA) are overrepresented in fusion transcripts in breast cancer. 6 Analyses of fusion transcripts have mainly focused on the production of chimeric proteins, but the co-transcriptional processing of miRNAs from primary transcripts 7 implies that the coding potential of the transcript is irrelevant from the perspective of miRNA expression. We coined the term miRNA-convergent fusions to describe a class of fusion transcripts where the exact identity and function of the 5 0 partner gene is unimportant, and in which multiple and different 5 0 partners can drive the expression of a given miRNA. 6 For these fusion transcripts, recurrence is therefore only defined as multiple occurrences of fusions involving the same miRNA host gene as a fusion partner.
The role of miRNAs and their associated Argonaute proteins in regulation of gene expression is well established, and primarily occurs through base-pairing of the miRNA to partially complementary target sites in the mRNAs of target genes, leading to mRNA destabilisation or translational inhibition. 8 A considerable number of miRNAs have been reported to act as tumour suppressors or oncogenes 9 and several different mechanisms have been described for deregulation of miRNA expression in cancer. 10 Examples include genomic copy number alterations, epigenetic factors such as promoter methylation status, post-transcriptional regulation of miRNA processing-and gene fusions involving miRNA host genes have now also been added to this list. Our earlier work demonstrated that the 5 0 partners of fusion transcripts involving a miRNA host gene as 3 0 partner had higher expression than the 5 0 partners of non-host genes, and that specific miRNAs were upregulated in samples with host gene fusions. 6 Here we have explored the properties of miRNA host gene fusions in breast tumours from the TCGA 11 and SCAN-B 12 cohorts with the hypotheses that (a) the 5 0 partner genes should belong to pathways and transcriptional programmes that reflect the tumour phenotype and (b) that there should be a selection for fusion events that shape miRNA expression to benefit the tumour cell through known hallmarks of cancer such as increased survival, proliferation, angiogenesis, or migration. We find that the 5 0 partners of miRNA host genes are associated with higher expression and lower promoter methylation. They are regulated by key transcription factors in cancer cells and act in pathways related to the malignant phenotype. Finally, we identify fusion transcripts as mechanisms for upregulation of oncogenic miRNAs including mir-21 and the mir-106b~mir-93~mir-25 cluster in breast cancer.

| Classification of samples
To reduce the number of TCGA samples with missing data, tumour ER and HER2 status were defined by the RNA expression of ESR1 and ERBB2, respectively. For each receptor, the distribution of expression values in fragments per kilobase of exon model and million reads (FPKM) were compared for each immunohistochemically determined (IHC) status and defined an expression threshold value between positive and negative ( Figure S1). The threshold for ER-positive samples was ESR1 FPKM = 5.7 and for HER2-positive samples ERBB2 FPKM = 73.5. IHC receptor status was available for all SCAN-B samples and therefore used instead of an FPKM threshold. PAM50 molecular subtypes in the SCAN-B cohort were obtained as previously described. 15

| Expression and promoter methylation analysis
Methylation and gene expression matrices were obtained from TCGA. For expression analysis, we calculated the average expression of each 5 0 fusion partner for each category of 3 0 partner gene (3 0 host including/excluding miRNA and 3 0 not host), as well as average expression in samples where the 5 0 partner was not involved in fusion events. The equivalent analysis was performed for 3 0 partner genes. For the promoter methylation analysis, we calculated the average methylation levels of CpG islands located within À1000 to +200 bases of the transcription start site for each gene. Student's t-test was used to test for differences in log 2transformed methylation beta values and expression levels between different groups of 5 0 fusion partner genes.

| Differential miRNA expression analysis
Total expression per mature miRNA in miRBase release 22 14 was calculated with a custom Perl script from the TCGA miRNA isoform quantification files using converted genomic coordinates. Differential expression analysis was performed using the exactTest implemented in edgeR. 19 2.6 | Analysis of whole genome sequencing data  instructions. For miRNAs, cDNA synthesis from 100 ng DNasetreated total RNA was performed as described. 24 Dilution of cDNA and PCR was done as before. Primer sequences are available in Table S1.

| Statistical analyses
All statistical analyses were performed in R version 3.6.3-4.0.5. The Benjamini-Hochberg procedure was used to adjust P-values in multiple testing to control the false discovery rate for all statistical tests.

| Fusion transcripts in the TCGA breast cancer cohort
To explore 5 0 partners and regulatory networks in miRNA host gene fusions, we have based our analyses on RNA-Seq data for 1092 breast tumours from TCGA (TCGA-BRCA cohort). For some analyses we have also included data for the 1540 breast tumours in our previous study (SCAN-B cohort). 6 There are many software tools available to detect fusion transcripts from RNA-Seq data. We chose to use FusionCatcher 13 due to its high sensitivity and comparatively low false discovery rate. 25 Clinical data for the TCGA samples are summarised in Table 1. Several samples had missing or ambiguous status for the two main prognostic and treatment-predictive biomarkers in breast cancer: oestrogen receptor alpha (ER, gene symbol ESR1) and Erb-B2 receptor tyrosine kinase 2 (HER2, gene symbol ERBB2).
We therefore decided to define ER and HER2 status by the expression of ESR1 and ERBB2, respectively, since expression levels of these genes are available for every sample in the cohort from the RNA-Seq data. Following this definition, 75% of samples were ER-positive and 15% HER2-positive, a distribution that is in-line with current literature 26 (see Table 1 and Figure S1). After applying the gene expression-based cut-off, the receptor status changed for 6.7% and 11.7% of samples previously annotated as positive or negative for ER or HER2, respectively. PAM50 molecular subtype classification was obtained from TCGA. 11 In total, we detected over 274 000 fusion transcript events in 1092 samples, an average of 251 fusions per sample (Tables S2 and   S3). A total of 16 530 unique genes were involved as fusion partners  (Table S2)

| microRNA host genes are overrepresented in fusion transcripts
We have previously shown that miRNA host genes are overrepresented as fusion partners in the SCAN-B breast cancer cohort. 6 To confirm this in the TCGA data, we constructed a logistic regression model that considered miRNA host gene status, gene size, and the interaction between the two factors ( Figure 1A). Host genes of miRNAs were indeed also overrepresented in the TCGA fusions (P = 7.62 Â 10 À7 , Wald test). The model was limited to all expressed protein-coding genes in the TCGA-BRCA cohort.
Looking at the genomic location of fusion partners, 90.8% of fusion events were inter-chromosomal and 9.2% were intra-chromosomal. As shown in Figure 1B, the fusion partners of intrachromosomal fusions were significantly more likely to be further apart when the 3 0 partner was a miRNA host (P < 2e À 16, Fisher's exact test). The median distance between fusion partners was 21. In accordance with the model of miRNA-convergent fusions we hypothesise that there is a selection for 5 0 partners that modulate miRNA expression in a way that provides an advantage to the cancer cell. We therefore analysed the properties of 5 0 partners of miRNA host fusion transcripts, starting with their expression level. As seen in Figure 1C, the 5 0 fusion partners of miRNA host genes had significantly higher expression levels than other 5 0 partners (P < 2e À 16, Student's t-test). Host genes of canonical snoRNAs were removed from the lists of non-host 5 0 partners as they have also been shown to be overrepresented among fusion transcripts. 27 As a background for the overrepresentation analyses we used all expressed genes in the TCGA-BRCA cohort. All significant gene sets are included in Table S4. We performed the analogous enrichment analysis for 3 0 partner genes in fusions with miRNA hosts as 5 0 partners for the different molecular subtypes in the TCGA data (Table S5) Table S4.  F I G U R E 3 Enriched REACTOME pathways and transcription factor targets for the 5 0 fusion partners of miRNA hosts, split by molecular subtype for the TCGA-BRCA cohort. Colour intensity shows the fraction of genes in each REACTOME pathway that is regulated by a given transcription factor, as predicted by UniBind 1 (AP-1), were overrepresented in all subtypes and in many cases    Figure 4A, the mature miRNA products miR-21-5p and miR-21-3p were both significantly upregulated in tumours with 3 0 VMP1 fusions compared with tumours without host gene fusions (P = 2.2e À 7 and P = 1.2e À 22, edgeR exact test). 19 Five samples with VMP1 fusion transcripts had matched WGS data, all confirming the existence of these fusions at the DNA level ( Figure 4B).
Several mature miRNAs of the mir-106b~mir-93~mir-25 cluster were also significantly overexpressed in tumours with 3 0 fusions of the host gene MCM7 ( Figure 4C). The corresponding P-values for miR-106b-5p and -3p were .0011 and 6.5e À 7, for miR-93-5p and -3p .013 and 2.7e À 7, and for miR-25-5p and -3p 9.8e À 4 and 1.6e À 5 (edgeR exact test). 19 Fusions of MCM7 have previously been reported in the SCAN-B breast cancer cohort, 6 as well as in ovarian and prostate cancer from the TCGA project. 5 Here we found fusion transcripts with MCM7 as 3 0 partner in eight tumours, however none of them had WGS data available. All fusions used different 5 0 partners, most of them with higher expression than the host gene ( Figure 4D).
From these data it is also possible to identify putative new target genes that have been predicted as targets by TargetScan and that are negatively correlated with the miRNAs across the TCGA-BRCA samples. Examples for miR-21-5p are the proteins of SRSF1, a regulator of alternative splicing, 46 and XBP1, a transcription factor involved in the unfolded protein response 47 (Pearson's r = À.27 and À.26; P = 8.8e À 9 and 3.4e À 8). ESR1, the oestrogen receptor alpha, was predicted as a target for miR-106b-5p and miR-93-3p, and both miRNAs were negatively correlated with the mRNA (Pearson's r = À.20 and À.15; P = 1.3e À 7 and .02). ESR1 protein was also negatively correlated with miR-106b-p using two different antibodies (Pearson's r = À.36 and À.18; P = 3.1e À 31 and 3.0e À 5). Furthermore, PEA15, a multifunctional protein involved in DNA damage response, 48 was predicted as a target for miR-106b-5p and miR-25-5p and was negatively correlated on the protein, but not mRNA, level (Pearson's r = À.22 and À.15; P = 3.9e À 9 and .006).
Finally, we also selected 11 tumours with miRNA host gene fusions and available RNA from the SCAN-B breast cancer cohort for experimental validation by real-time quantitative RT-PCR. Seven out of the 11 samples (64%) had a readily detectable fusion transcript and for 9 out of the 11 samples (82%) the miRNA was expressed above the median level of a control group of six tumours (Table S8 and    cluster in MCM7. Expression of these miRNAs was anticorrelated with many predicted and experimentally validated target genes across the TCGA breast cancer cohort. These included many tumour suppressors, which is in line with the proposed oncogenic role for the miRNAs, 41 but there were also many oncogenes among the negatively correlated genes (Table S7). This suggests that the functions of these miRNAs might be more complex than what has previously been appreciated, and that there might be subgroups of tumours where their roles are different. Since cellular signalling pathways normally contain both positive and negative regulators, this is not necessarily an unexpected finding, but a clear reminder that no miRNA evolved for the purposes of a cancer cell.
Although several papers have identified fusion transcripts in cancer, 5 global analyses of miRNA host gene fusions are still lacking with the exception of our work in breast cancer. 6 We have previously reported that fusions involving miRNA hosts are common but overlooked for several reasons; research has focused on protein-coding genes and in-frame fusion transcripts, and miRNA-convergent fusions using different 5 0 partners may not have been classified as recurrent.
One advantage with the TCGA data is that more than 98% of the samples have small RNA sequencing data, which allowed us to analyse the effects of host fusions on miRNA expression. Still, the number of samples with fusions for a given host gene becomes limiting for the ability to detect differentially expressed miRNAs. Whole genome sequencing data was also available from TCGA for 5 out of the 64 tumours with mir-21 fusions, allowing us to verify the genomic fusions involving mir-21.
The expression of miRNAs can be deregulated in many different ways, including genomic amplification and deletion, altered promoter methylation, transcriptional rate, or processing. To this list we have added convergent fusion transcripts, potentially caused by genomic rearrangements that change the promoter region. We have shown that the set of 5 0 partners in miRNA host fusions is non-random and demonstrates overrepresentation of highly expressed genes in subtype-specific pathways active in breast cancer. This adds material to the process of genetic selection and tumour evolution in cancer cells and suggests that miRNA host fusions may provide a growth advantage and function as tumour 'drivers'.

ETHICS STATEMENT
For the SCAN-B data the study was conducted in accordance with the Declaration of Helsinki and has been approved by