liu.seSearch for publications in DiVA
3839404142434441 of 553
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discrete Diffusion Models for Language Generation
Linköping University, Department of Computer and Information Science.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Diffusion models have emerged as a powerful class of generative models, achieving state-of-the-art performance in continuous data domains such as image synthesis and video generation. Their core mechanism involves a forward diffusion process thatgradually transforms structured data into a Gaussian-like distribution, followed by a learned reverse diffusion process that reconstructs the data. While this framework has proven effective for continuous modalities, its application to discrete data-particularly natural language-remains a challenging and active area of research. Key obstacles include the complexity of modeling discrete token dependencies and the lack of a naturally defined generation order.This thesis investigates the feasibility and performance of discrete diffusion models in the context of natural language generation. In particular, we examine the Discrete Denoising Diffusion Probabilistic Model (D3PM) and compare its performance against traditional autoregressive (AR) language models. To evaluate and compare the generative capacity of both models, we utilize common metrics such as Bits Per Token (BPT), Negative Log-Likelihood (NLL), Perplexity (PPL), and Batch Processing Speed.Experimental results show that the best-performing D3PM model achieves a BPT of 5.72, with a mean value of 8.05 across runs. In contrast, the AR model demonstrates a lower mean BPT of 4.59, indicating better compression and generative efficiency. However, D3PM models achieve higher batch processing speeds, with a maximum of 3.97 batches per second, highlighting their potential for parallel generation.All evaluations were conducted under consistent conditions-generating 100,000 tokens per model with a fixed batch size of four-to ensure fairness and comparability. This research contributes a detailed comparative analysis of diffusion-based and autoregressive models, offering insights into their respective trade-offs. Ultimately, the findings underscore both the promise and the current limitations of applying diffusion models to discrete sequence modeling, and provide a foundation for future exploration in non-autoregressive language generation frameworks.The source code used in this research is available at the following GitHub repository: \url{https://github.com/AshenWELI/Discrete-Diffusion-Models-for-Language-Genaration}.

Place, publisher, year, edition, pages
2025. , p. 76
Keywords [en]
Machine Learning, Statistics, Autoregressive model, Diffusion Model, D3PM
National Category
Artificial Intelligence
Identifiers
URN: urn:nbn:se:liu:diva-215921ISRN: LIU-IDA/STAT-A--25/008--SEDOI: 10.3384/LIU-IDASTAT-A--25008--SEOAI: oai:DiVA.org:liu-215921DiVA, id: diva2:1980887
Subject / course
Statistics
Presentation
2025-06-04, John von Neumann, Linköping, 13:45 (English)
Supervisors
Examiners
Available from: 2025-07-03 Created: 2025-07-03 Last updated: 2025-07-03Bibliographically approved

Open Access in DiVA

fulltext(2530 kB)35 downloads
File information
File name FULLTEXT01.pdfFile size 2530 kBChecksum SHA-512
b828c929d6f5dc7be7c4568d99c0dea65f6a9b441b05a97cb7799725e472120c7a3f9c7b1b6523c5eecbe87fc4e6d57a28616ee7ba17171c9f523854a4f56379
Type fulltextMimetype application/pdf

Other links

Publisher's full texthttps://doi.org/10.3384/LIU-IDA/STAT-A--25/008--SE

Search in DiVA

By author/editor
Thalgamuwe Gedara, Ashen Akalanka Weligalle
By organisation
Department of Computer and Information Science
Artificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar
Total: 35 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 110 hits
3839404142434441 of 553
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf