liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Novel methods and software for disease module inference
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Cellular organization is believed to be modular, meaning cellular functions are carried out by modules composed of clusters of genes, proteins and metabolites that are interconnected, co-regulated or physically interacting. In turn, these modules interact together and thereby form complex networks that taken together is considered to be the interactome. 

Modern high-throughput biological techniques have made high-scale accurate quantification of these biological molecules possible, the so called omics. The simultaneous measurement of these molecules enables a picture of the state of a cell at a resolution that was never before possible. Mapping these measurements aids greatly to elucidate a network structure of interactions. The ever growing size of public repositories for omics data has ushered in the advent of biology as a (big) data science and opens the door for data hungry machine learning approaches in biology. 

Complex diseases are multi-factorial and arise from a combination of genetic, environmental and lifestyle factors. Additionally, diagnosis and treatment is complicated by the fact that these genetic, environmental and lifestyle factors can vary between patients and may or may not give rise to different disease phenotypes that still classify as the same disease. Genetically, there is substantial heterogeneity among patients and therefore the emergence of a disease phenotype cannot be attributed to a single genetic mutation but rather to a combination of various mutations that may vary from patient to patient. As complex diseases can have different root causes but give rise to a similar disease phenotype, the implication is that different root causes perturb similar components in the interactome. Most of the work in this thesis is aimed at developing methods and computational pipelines to identify, analyze and evaluate these perturbed disease specific sub-networks in the interactome, so called disease modules. 

We started by collecting popular disease module inference methods and combined them in a unified framework, an R package called MODifieR (Paper I). The package uses standardized inputs and outputs, allowing for a more user-friendly way of running multiple disease module inference methods and the combining of modules. Next, we benchmarked the MODifieR methods on a compendium of transcriptomic and methylomic datasets and combined transcriptomic and methylomic disease modules for Multiple Sclerosis (MS) to a highly disease-relevant module greatly enriched with known risk factors for MS (Paper II). Subsequently, we extended the functionality of MODifieR with software for transcription factor hub detection in gene regulatory networks in a new framework with a graphical user interface, MODalyseR. We used MODalyseR to find upstream regulators and identified IKZF1 as an important upstream regulator for MS (Paper III). Lastly, we used the growing large-scale repositories of gene expression data to train a Variational Auto Encoder (VAE) to compress and decompress gene expression profiles with the aim of extracting disease modules from the latent space. Utilizing the continues nature of the latent space in VAE’s, we derived the differences in latent space representations between a compendium of complex disease gene expression profiles and matched healthy controls. We then derived disease modules from the decompressed latent space representation of this difference and found the modules highly enriched with disease-associated genes, generally outperforming the gold standard of transcriptomic analysis of diseases, top differentially expressed genes (Paper IV). 

To conclude, the main scientific contribution of this thesis lies in the development of software and methods for improving disease module inference, the evaluation of existing inference methods, the creation of new analysis workflows for multi-omics modules, and the introduction of a deep learning-based approach to the disease module inference toolkit. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2023. , p. 54
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2282
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:liu:diva-191118DOI: 10.3384/9789180750097ISBN: 9789180750080 (print)ISBN: 9789180750097 (electronic)OAI: oai:DiVA.org:liu-191118DiVA, id: diva2:1728727
Public defence
2023-02-17, Nobel, B-building, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2023-01-19 Created: 2023-01-19 Last updated: 2025-02-07Bibliographically approved
List of papers
1. MODifieR: an Ensemble R Package for Inference of Disease Modules from Transcriptomics Networks
Open this publication in new window or tab >>MODifieR: an Ensemble R Package for Inference of Disease Modules from Transcriptomics Networks
Show others...
2020 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 36, no 12, p. 3918-3919Article in journal (Refereed) Published
Abstract [en]

Motivation: Complex diseases are due to the dense interactions of many disease-associated factors that dysregulate genes that in turn form the so-called disease modules, which have shown to be a powerful concept for understanding pathological mechanisms. There exist many disease module inference methods that rely on somewhat different assumptions, but there is still no gold standard or best-performing method. Hence, there is a need for combining these methods to generate robust disease modules. Results: We developed MODule IdentiFIER (MODifieR), an ensemble R package of nine disease module inference methods from transcriptomics networks. MODifieR uses standardized input and output allowing the possibility to combine individual modules generated from these methods into more robust disease-specific modules, contributing to a better understanding of complex diseases.

Place, publisher, year, edition, pages
OXFORD UNIV PRESS, 2020
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:liu:diva-168277 (URN)10.1093/bioinformatics/btaa235 (DOI)000550127500051 ()32271876 (PubMedID)2-s2.0-85087321319 (Scopus ID)
Note

Funding Agencies|Knowledge Foundation; Swedish Research CouncilSwedish Research Council; Swedish foundation for strategic researchSwedish Foundation for Strategic Research

Available from: 2020-08-21 Created: 2020-08-21 Last updated: 2025-11-04Bibliographically approved
2. A validated generally applicable approach using the systematic assessment of disease modules by GWAS reveals a multi-omic module strongly associated with risk factors in multiple sclerosis
Open this publication in new window or tab >>A validated generally applicable approach using the systematic assessment of disease modules by GWAS reveals a multi-omic module strongly associated with risk factors in multiple sclerosis
Show others...
2021 (English)In: BMC Genomics, E-ISSN 1471-2164, Vol. 22, no 1, article id 631Article in journal (Refereed) Published
Abstract [en]

Background There exist few, if any, practical guidelines for predictive and falsifiable multi-omic data integration that systematically integrate existing knowledge. Disease modules are popular concepts for interpreting genome-wide studies in medicine but have so far not been systematically evaluated and may lead to corroborating multi-omic modules. Result We assessed eight module identification methods in 57 previously published expression and methylation studies of 19 diseases using GWAS enrichment analysis. Next, we applied the same strategy for multi-omic integration of 20 datasets of multiple sclerosis (MS), and further validated the resulting module using both GWAS and risk-factor-associated genes from several independent cohorts. Our benchmark of modules showed that in immune-associated diseases modules inferred from clique-based methods were the most enriched for GWAS genes. The multi-omic case study using MS data revealed the robust identification of a module of 220 genes. Strikingly, most genes of the module were differentially methylated upon the action of one or several environmental risk factors in MS (n = 217, P = 10(- 47)) and were also independently validated for association with five different risk factors of MS, which further stressed the high genetic and epigenetic relevance of the module for MS. Conclusions We believe our analysis provides a workflow for selecting modules and our benchmark study may help further improvement of disease module methods. Moreover, we also stress that our methodology is generally applicable for combining and assessing the performance of multi-omic approaches for complex diseases.

Place, publisher, year, edition, pages
BMC, 2021
Keywords
Benchmark; Multi-omics; Network modules; Multiple sclerosis; Risk factors; Disease modules; Network analysis; Protein network analysis; Transcriptomics; Methylomics; Data integration; Genome-wide association analysis
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:liu:diva-179166 (URN)10.1186/s12864-021-07935-1 (DOI)000692402600002 ()34461822 (PubMedID)
Note

Funding Agencies|Swedish Research CouncilSwedish Research CouncilEuropean Commission [201503807, 2018-02638]; Swedish foundation for strategic researchSwedish Foundation for Strategic Research [SB16-0095]; Center for Industrial IT (CENIIT); European Union Horizon 2020/European Research Council Consolidator grant (Epi4MS) [818170]; Knut and Alice Wallenberg FoundationKnut & Alice Wallenberg Foundation [2019.0089]; Knowledge Foundation [20170298]; Linkoping University

Available from: 2021-09-14 Created: 2021-09-14 Last updated: 2025-02-07
3. MODalyseR—a novel software for inference of disease module hub regulators identified a putative multiple sclerosis regulator supported by independent eQTL data
Open this publication in new window or tab >>MODalyseR—a novel software for inference of disease module hub regulators identified a putative multiple sclerosis regulator supported by independent eQTL data
Show others...
2022 (English)In: Bioinformatics Advances, ISSN 2635-0041, Vol. 2, no 1Article in journal (Refereed) Published
Abstract [en]

Network-based disease modules have proven to be a powerful concept for extracting knowledge about disease mechanisms, predicting for example disease risk factors and side effects of treatments. Plenty of tools exist for the purpose of module inference, but less effort has been put on simultaneously utilizing knowledge about regulatory mechanisms for predicting disease module hub regulators.We developed MODalyseR, a novel software for identifying disease module regulators and reducing modules to the most disease-associated genes. This pipeline integrates and extends previously published software packages MODifieR and ComHub and hereby provides a user-friendly network medicine framework combining the concepts of disease modules and hub regulators for precise disease gene identification from transcriptomics data. To demonstrate the usability of the tool, we designed a case study for multiple sclerosis that revealed IKZF1 as a promising hub regulator, which was supported by independent ChIP-seq data.MODalyseR is available as a Docker image at https://hub.docker.com/r/ddeweerd/modalyser with user guide and installation instructions found at https://gustafsson-lab.gitlab.io/MODalyseR/.Supplementary data are available at Bioinformatics Advances online.

Place, publisher, year, edition, pages
Oxford University Press, 2022
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:liu:diva-191117 (URN)10.1093/bioadv/vbac006 (DOI)001153137500002 ()
Note

Funding agencies: This work was supported by the Knowledge Foundation [dnr HSK219/26]; Swedish Foundation for Strategic Research [SB16-0011]; and Swedish Research Council [grant 2019-04193].

Available from: 2023-01-19 Created: 2023-01-19 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(4850 kB)549 downloads
File information
File name FULLTEXT01.pdfFile size 4850 kBChecksum SHA-512
2816c3eefd5acff3f89670d8e7f627e173e7ac303bac235811ba301ce6b1fd5431db2a7275553f851609d52a32335f96a5d3e6d794a34cf4c0744c625e9f7496
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

de Weerd, Hendrik Arnold

Search in DiVA

By author/editor
de Weerd, Hendrik Arnold
By organisation
BioinformaticsFaculty of Science & Engineering
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 552 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1894 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf