liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6363-6298
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
Chalmers Univ Technol, Sweden.
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
2023 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 24, no 5, article id bbad293Article in journal (Refereed) Published
Abstract [en]

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.

Place, publisher, year, edition, pages
OXFORD UNIV PRESS , 2023. Vol. 24, no 5, article id bbad293
Keywords [en]
deep learning; autoencoders; DNA methylation; transfer learning; biomarkers; systems medicine
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:liu:diva-197471DOI: 10.1093/bib/bbad293ISI: 001049091000001PubMedID: 37587790OAI: oai:DiVA.org:liu-197471DiVA, id: diva2:1794557
Note

Funding Agencies|Swedish Research Council [2019-04193]; Wallenberg AI, Autonomous Systems and Software Program (WASP); SciLifeLab and Wallenberg National~Program for Data-Driven Life Science (DDLS) [WASPDDLS21-040/KAW 2020.0239]

Available from: 2023-09-06 Created: 2023-09-06 Last updated: 2025-11-17
In thesis
1. Explainable deep learning for DNA methylation analysis in health and disease
Open this publication in new window or tab >>Explainable deep learning for DNA methylation analysis in health and disease
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern clinical decision support requires models that are both accurate and mechanistically interpretable. DNA methylation tracks the cumulative influence of development, lifestyle, and environment on gene regulation, but its dimensionality and tissue specificity complicate analysis and clinical application. This thesis develops explainable deep learning methods that learn coherent biological signals from genome-wide methylation data, aiming to derive reliable biomarkers of aging, disease risk and severity, and system-level health. Central to our approach are deep autoencoders, unsupervised multi-layered neural networks that efficiently compress DNA methylation data into low-dimensional embeddings that preserve relevant biology, paired with interpretability techniques that expose feature contributions and model reasoning, such as perturbation-based latent activation.

By training on large multi-tissue compendia of human DNA methylation samples, we observed that the autoencoders self-organized their latent spaces, recapitulating protein-protein interaction (PPI) modules. Interpreting these structured embeddings yielded pathway-enriched epigenomic signatures that supported accurate epigenetic age estimation and robust classification of disease status and smoking. Building on these findings, we introduced a PPI-guided autoencoder that incorporates a graph-regularized protein interaction prior, encouraging each latent unit to be functionally specific and colocalized within the human interactome. We showed that this soft guidance improved the mechanistic interpretability of downstream models, in this case supervised translators that map between omics modalities (transcriptomics, DNA methylation, genomics).

In parallel, we combined autoencoder embeddings with established aging markers to train explainable neural-network age clocks that achieved state-of-the-art cross-tissue precision, while also capturing fine-grained developmental, immune, and metabolic signatures. Finally, we operationalized these representations in a clinical decision-support pipeline that predicts respiratory, cardiovascular, and metabolic system-level health scores from blood methylation, with supervised deep learning models that highlight biological processes associated with each physiological system. Collectively, this work provides a scalable and auditable framework that converts methylomes into interpretable feature sets and actionable indicators for clinical use, enabling early risk assessment, monitoring of treatment responses and lifestyle changes, and informed therapeutic target prioritization.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. p. 105
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2490
Keywords
Deep learning, Autoencoders, DNA methylation, Aging, Health
National Category
Medical Genetics and Genomics
Identifiers
urn:nbn:se:liu:diva-219551 (URN)10.3384/9789181183320 (DOI)9789181183313 (ISBN)9789181183320 (ISBN)
Public defence
2025-12-18, C1, C-building, Campus Valla, Linköping, 09:00 (English)
Opponent
Supervisors
Note

Funding Agencies: Swedish Heart-Lung Foundation

Available from: 2025-11-17 Created: 2025-11-17 Last updated: 2025-11-17Bibliographically approved

Open Access in DiVA

fulltext(2130 kB)132 downloads
File information
File name FULLTEXT01.pdfFile size 2130 kBChecksum SHA-512
7a3e1651e642ff30a565199876458d470f7b4c4a9bbf848efde4fd7a122e9d749877f54b5c9a023830b9c7e70615cdd81b4e77f3625144cacd2b529c55156f8b
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Martinez, DavidDwivedi, SanjivGustafsson, Mika
By organisation
BioinformaticsFaculty of Science & Engineering
In the same journal
Briefings in Bioinformatics
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 132 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 577 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf