liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improved detection of clinically relevant fusion transcripts in cancer by machine learning classification
Lund Univ, Sweden.
Lund Univ, Sweden.
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering. (Science for Life Laboratory)ORCID iD: 0000-0002-1263-679X
Lund Univ, Sweden.
Show others and affiliations
2023 (English)In: BMC Genomics, E-ISSN 1471-2164, Vol. 24, no 1, article id 783Article in journal (Refereed) Published
Abstract [en]

BackgroundGenomic rearrangements in cancer cells can create fusion genes that encode chimeric proteins or alter the expression of coding and non-coding RNAs. In some cancer types, fusions involving specific kinases are used as targets for therapy. Fusion genes can be detected by whole genome sequencing (WGS) and targeted fusion panels, but RNA sequencing (RNA-Seq) has the advantageous capability of broadly detecting expressed fusion transcripts.ResultsWe developed a pipeline for validation of fusion transcripts identified in RNA-Seq data using matched WGS data from The Cancer Genome Atlas (TCGA) and applied it to 910 tumors from 11 different cancer types. This resulted in 4237 validated gene fusions, 3049 of them with at least one identified genomic breakpoint. Utilizing validated fusions as true positive events, we trained a machine learning classifier to predict true and false positive fusion transcripts from RNA-Seq data. The final precision and recall metrics of the classifier were 0.74 and 0.71, respectively, in an independent dataset of 249 breast tumors. Application of this classifier to all samples with RNA-Seq data from these cancer types vastly extended the number of likely true positive fusion transcripts and identified many potentially targetable kinase fusions. Further analysis of the validated gene fusions suggested that many are created by intrachromosomal amplification events with microhomology-mediated non-homologous end-joining.ConclusionsA classifier trained on validated fusion events increased the accuracy of fusion transcript identification in samples without WGS data. This allowed the analysis to be extended to all samples with RNA-Seq data, facilitating studies of tumor biology and increasing the number of detected kinase fusions. Machine learning could thus be used in identification of clinically relevant fusion events for targeted therapy. The large dataset of validated gene fusions generated here presents a useful resource for development and evaluation of fusion transcript detection algorithms.

Place, publisher, year, edition, pages
BMC , 2023. Vol. 24, no 1, article id 783
Keywords [en]
Fusion transcript; Gene fusion; Cancer genomics; Tumor biology; Precision medicine; Machine learning; Microhomology; Kinase
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:liu:diva-199979DOI: 10.1186/s12864-023-09889-yISI: 001127573600005PubMedID: 38110872OAI: oai:DiVA.org:liu-199979DiVA, id: diva2:1825759
Note

Funding Agencies|Lund University [2018-05973]; Regional Cancer Centre South - Swedish Research Council

Available from: 2024-01-10 Created: 2024-01-10 Last updated: 2024-10-18

Open Access in DiVA

fulltext(13646 kB)10 downloads
File information
File name FULLTEXT02.pdfFile size 13646 kBChecksum SHA-512
d0f3b04683ba611503b04745d0e8c2a1e0e7691ec8b7fd02ea8d5c55e8af2db083273eb903aea5100760de0de05912f9ccdec885f14f6e20a030eafbe533679f
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Larsson, Malin
By organisation
BioinformaticsFaculty of Science & Engineering
In the same journal
BMC Genomics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 10 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 47 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf