Discrete Diffusion Models for Language Generation
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Diffusion models have emerged as a powerful class of generative models, achieving state-of-the-art performance in continuous data domains such as image synthesis and video generation. Their core mechanism involves a forward diffusion process thatgradually transforms structured data into a Gaussian-like distribution, followed by a learned reverse diffusion process that reconstructs the data. While this framework has proven effective for continuous modalities, its application to discrete data-particularly natural language-remains a challenging and active area of research. Key obstacles include the complexity of modeling discrete token dependencies and the lack of a naturally defined generation order.This thesis investigates the feasibility and performance of discrete diffusion models in the context of natural language generation. In particular, we examine the Discrete Denoising Diffusion Probabilistic Model (D3PM) and compare its performance against traditional autoregressive (AR) language models. To evaluate and compare the generative capacity of both models, we utilize common metrics such as Bits Per Token (BPT), Negative Log-Likelihood (NLL), Perplexity (PPL), and Batch Processing Speed.Experimental results show that the best-performing D3PM model achieves a BPT of 5.72, with a mean value of 8.05 across runs. In contrast, the AR model demonstrates a lower mean BPT of 4.59, indicating better compression and generative efficiency. However, D3PM models achieve higher batch processing speeds, with a maximum of 3.97 batches per second, highlighting their potential for parallel generation.All evaluations were conducted under consistent conditions-generating 100,000 tokens per model with a fixed batch size of four-to ensure fairness and comparability. This research contributes a detailed comparative analysis of diffusion-based and autoregressive models, offering insights into their respective trade-offs. Ultimately, the findings underscore both the promise and the current limitations of applying diffusion models to discrete sequence modeling, and provide a foundation for future exploration in non-autoregressive language generation frameworks.The source code used in this research is available at the following GitHub repository: \url{https://github.com/AshenWELI/Discrete-Diffusion-Models-for-Language-Genaration}.
Place, publisher, year, edition, pages
2025. , p. 76
Keywords [en]
Machine Learning, Statistics, Autoregressive model, Diffusion Model, D3PM
National Category
Artificial Intelligence
Identifiers
URN: urn:nbn:se:liu:diva-215921ISRN: LIU-IDA/STAT-A--25/008--SEDOI: 10.3384/LIU-IDASTAT-A--25008--SEOAI: oai:DiVA.org:liu-215921DiVA, id: diva2:1980887
Subject / course
Statistics
Presentation
2025-06-04, John von Neumann, Linköping, 13:45 (English)
Supervisors
Examiners
2025-07-032025-07-032025-07-03Bibliographically approved