liu.seSearch for publications in DiVA
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Representative Synthetic Data for Fair Decision Making
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0001-5307-997X
2025 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Deep generative models and representation learning techniques have become essential to modern machine learning, enabling both the creation of synthetic data and the extraction of meaningful features for downstream tasks. While synthetic data generation promises to address data scarcity concerns, and representation learning forms the backbone of decision-making systems, both approaches can amplify societal biases present in the training data. In high-stakes applications such as healthcare, criminal justice, and financial services, biased synthetic data can pass on discriminatory patterns to new datasets, while biased representation can lead to unfair decisions that disproportionately harm marginalized groups. The challenge becomes more extensive when dealing with intersectional bias, where discrimination occurs at the intersection of multiple sensitive attributes, e.g. race, gender, etc. In this dissertation, we propose approaches for fair synthetic data generation and fair representation learning that target bias mitigation for both individual sensitive attributes and their intersections.

To address model-specific biases in synthetic data generation, we first introduce Fair Latent Deep Generative Models (FLDGMs), a syntax-agnostic framework that first learns low-dimensional fair latent representations via a fairness-aware compression step, and then generates a synthetic fair latent space using either GANs or diffusion models, followed by high-fidelity reconstruction through a decoder. We also present Bias-transforming GAN (Bt-GAN), which tackles healthcare data biases by imposing information-theoretic constraints and preserves subgroup representation using density-aware sampling.

For fair representation, we develop two novel representation learning techniques specifically designed to address intersectional fairness. First, we present a knowledge distillation-based approach, where we distill knowledge from an accuracy-focused teacher into a student model that enforces intersectional fairness constraints, including False Positive Rate (FPR) and demographic parity, effectively reducing FPR disparities in multi-class settings. Second, Diff-Fair uses diffusion-based representation learning to minimize mutual information with sensitive attributes, integrating intersectional and FPR regularizers to reduce demographic and outcome disparities across subgroups, while maintaining strong accuracy in binary and multi-class tasks. Also, to enable systematic evaluation, we introduce FairX, an open-source benchmarking suite that integrates pre-, in-, and post-processing bias mitigation methods alongside fairness, utility, explainability, and synthetic-data evaluation metrics.

Along with fair synthetic data, we also investigate how to capture the distribution of time-series data using generative models. We present TransFusion, a diffusion and transformer-based architecture, that generates high-fidelity, long-sequenced synthetic time-series data (sequence length upto 384).

Together, these contributions advance the generation of synthetic data and fair representations by integrating bias-transforming models, intersectional fairness constraints, robust benchmarking, and long-sequence synthesis, offering practical tools for building more equitable AI systems.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. , p. 166
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2495
National Category
Artificial Intelligence
Identifiers
URN: urn:nbn:se:liu:diva-219906DOI: 10.3384/9789181183757ISBN: 9789181183740 (print)ISBN: 9789181183757 (electronic)OAI: oai:DiVA.org:liu-219906DiVA, id: diva2:2019506
Public defence
2026-01-16, Ada Lovelace, B Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: This work was funded by the Knut and Alice Wallenberg Foundation, Sweden, the ELLIIT Excellence Center at Linköping-Lund for Information Technology, Sweden (portions of this work were carried out using the AIOps/Stellar), and TAILOR - an EU project with the aim to provide the scientific foundations for Trustworthy AI in Europe. The computations were enabled by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2025-12-08 Created: 2025-12-08 Last updated: 2025-12-09Bibliographically approved

Open Access in DiVA

fulltext(6655 kB)213 downloads
File information
File name FULLTEXT01.pdfFile size 6655 kBChecksum SHA-512
62e5e59ccb31fefbd1cf3467c0ce6e6135da02f50f5776119a497df10d8c294ed86a55528d362d7d201cc21bc4fc6379faf8ac88abe4923aa0d806eed87ea1cf
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Sikder, Md Fahim

Search in DiVA

By author/editor
Sikder, Md Fahim
By organisation
Artificial Intelligence and Integrated Computer SystemsFaculty of Science & Engineering
Artificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 3154 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf