liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explainable deep learning for DNA methylation analysis in health and disease
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6363-6298
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern clinical decision support requires models that are both accurate and mechanistically interpretable. DNA methylation tracks the cumulative influence of development, lifestyle, and environment on gene regulation, but its dimensionality and tissue specificity complicate analysis and clinical application. This thesis develops explainable deep learning methods that learn coherent biological signals from genome-wide methylation data, aiming to derive reliable biomarkers of aging, disease risk and severity, and system-level health. Central to our approach are deep autoencoders, unsupervised multi-layered neural networks that efficiently compress DNA methylation data into low-dimensional embeddings that preserve relevant biology, paired with interpretability techniques that expose feature contributions and model reasoning, such as perturbation-based latent activation.

By training on large multi-tissue compendia of human DNA methylation samples, we observed that the autoencoders self-organized their latent spaces, recapitulating protein-protein interaction (PPI) modules. Interpreting these structured embeddings yielded pathway-enriched epigenomic signatures that supported accurate epigenetic age estimation and robust classification of disease status and smoking. Building on these findings, we introduced a PPI-guided autoencoder that incorporates a graph-regularized protein interaction prior, encouraging each latent unit to be functionally specific and colocalized within the human interactome. We showed that this soft guidance improved the mechanistic interpretability of downstream models, in this case supervised translators that map between omics modalities (transcriptomics, DNA methylation, genomics).

In parallel, we combined autoencoder embeddings with established aging markers to train explainable neural-network age clocks that achieved state-of-the-art cross-tissue precision, while also capturing fine-grained developmental, immune, and metabolic signatures. Finally, we operationalized these representations in a clinical decision-support pipeline that predicts respiratory, cardiovascular, and metabolic system-level health scores from blood methylation, with supervised deep learning models that highlight biological processes associated with each physiological system. Collectively, this work provides a scalable and auditable framework that converts methylomes into interpretable feature sets and actionable indicators for clinical use, enabling early risk assessment, monitoring of treatment responses and lifestyle changes, and informed therapeutic target prioritization.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. , p. 105
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2490
Keywords [en]
Deep learning, Autoencoders, DNA methylation, Aging, Health
National Category
Medical Genetics and Genomics
Identifiers
URN: urn:nbn:se:liu:diva-219551DOI: 10.3384/9789181183320ISBN: 9789181183313 (print)ISBN: 9789181183320 (electronic)OAI: oai:DiVA.org:liu-219551DiVA, id: diva2:2014180
Public defence
2025-12-18, C1, C-building, Campus Valla, Linköping, 09:00 (English)
Opponent
Supervisors
Note

Funding Agencies: Swedish Heart-Lung Foundation

Available from: 2025-11-17 Created: 2025-11-17 Last updated: 2025-11-17Bibliographically approved
List of papers
1. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
Open this publication in new window or tab >>NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
2023 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 24, no 5, article id bbad293Article in journal (Refereed) Published
Abstract [en]

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.

Place, publisher, year, edition, pages
OXFORD UNIV PRESS, 2023
Keywords
deep learning; autoencoders; DNA methylation; transfer learning; biomarkers; systems medicine
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:liu:diva-197471 (URN)10.1093/bib/bbad293 (DOI)001049091000001 ()37587790 (PubMedID)
Note

Funding Agencies|Swedish Research Council [2019-04193]; Wallenberg AI, Autonomous Systems and Software Program (WASP); SciLifeLab and Wallenberg National~Program for Data-Driven Life Science (DDLS) [WASPDDLS21-040/KAW 2020.0239]

Available from: 2023-09-06 Created: 2023-09-06 Last updated: 2025-11-17
2. Precise and interpretable neural networks reveal epigenetic signatures of aging across youth in health and disease
Open this publication in new window or tab >>Precise and interpretable neural networks reveal epigenetic signatures of aging across youth in health and disease
Show others...
2025 (English)In: Frontiers in Aging, E-ISSN 2673-6217, Vol. 5, article id 1526146Article in journal (Refereed) Published
Abstract [en]

Introduction DNA methylation (DNAm) age clocks are powerful tools for measuring biological age, providing insights into aging risks and outcomes beyond chronological age. While traditional models are effective, their interpretability is limited by their dependence on small and potentially stochastic sets of CpG sites. Here, we propose that the reliability of DNAm age clocks should stem from their capacity to detect comprehensive and targeted aging signatures.Methods We compiled publicly available DNAm whole-blood samples (n = 17,726) comprising the entire human lifespan (0-112 years). We used a pre-trained network-coherent autoencoder (NCAE) to compress DNAm data into embeddings, with which we trained interpretable neural network epigenetic clocks. We then retrieved their age-specific epigenetic signatures of aging and examined their functional enrichments in age-associated biological processes.Results We introduce NCAE-CombClock, a novel highly precise (R2 = 0.978, mean absolute error = 1.96 years) deep neural network age clock integrating data-driven DNAm embeddings and established CpG age markers. Additionally, we developed a suite of interpretable NCAE-Age neural network classifiers tailored for adolescence and young adulthood. These clocks can accurately classify individuals at critical developmental ages in youth (AUROC = 0.953, 0.972, and 0.927, for 15, 18, and 21 years) and capture fine-grained, single-year DNAm signatures of aging that are enriched in biological processes associated with anatomic and neuronal development, immunoregulation, and metabolism. We showcased the practical applicability of this approach by identifying candidate mechanisms underlying the altered pace of aging observed in pediatric Crohn's disease.Discussion In this study, we present a deep neural network epigenetic clock, named NCAE-CombClock, that improves age prediction accuracy in large datasets, and a suite of explainable neural network clocks for robust age classification across youth. Our models offer broad applications in personalized medicine and aging research, providing a valuable resource for interpreting aging trajectories in health and disease.

Place, publisher, year, edition, pages
FRONTIERS MEDIA SA, 2025
Keywords
DNA methylation; neural networks; age clock; epigenetic age; youth
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:liu:diva-211720 (URN)10.3389/fragi.2024.1526146 (DOI)001414074300001 ()39916723 (PubMedID)2-s2.0-85216955208 (Scopus ID)
Note

Funding Agencies|Vetenskapsrdet10.13039/501100004359 [Berzelius-2022-156, Berzelius-2024-5, LiU-compute-2023-38, NAISS 2023/5-303]

Available from: 2025-02-18 Created: 2025-02-18 Last updated: 2025-11-24

Open Access in DiVA

fulltext(9487 kB)147 downloads
File information
File name FULLTEXT01.pdfFile size 9487 kBChecksum SHA-512
8fa47f5b63a6f5784b3bacb9d619dfabe97393fda5e4777d90416a13fce9e4c9c4a9257b4e5d1a2defc65f10ac2ebc5c49dbc90c708757d2196669049c43859d
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Martínez Enguita, David

Search in DiVA

By author/editor
Martínez Enguita, David
By organisation
BioinformaticsFaculty of Science & Engineering
Medical Genetics and Genomics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1577 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf