liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact.ORCID iD: 0000-0002-0194-6346
Zenseact; Chalmers University of Technology, Gothenburg, Sweden.
UC Berkeley, USA.
Zenseact.
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Validating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation. To promote further research in scalable and realistic AD simulation, we will release our dataset and code, see https://research.zenseact.com/publications/R3D2/

National Category
Artificial Intelligence Vehicle and Aerospace Engineering
Identifiers
URN: urn:nbn:se:liu:diva-218892DOI: 10.48550/arXiv.2506.07826OAI: oai:DiVA.org:liu-218892DiVA, id: diva2:2007149
Available from: 2025-10-17 Created: 2025-10-17 Last updated: 2025-10-27
In thesis
1. On the Road to Safe Autonomous Driving via Data, Learning, and Validation
Open this publication in new window or tab >>On the Road to Safe Autonomous Driving via Data, Learning, and Validation
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous driving systems hold the promise of safer and more efficient transportation, with the potential to fundamentally reshape what everyday mobility looks like. However, to realize these promises, such systems must perform reliably in both routine driving and in rare, safety-critical situations. To this end, this thesis addresses three core aspects of autonomous driving development: data, learning, and validation.

First, we tackle the fundamental need for high-quality data by introducing the Zenseact Open Dataset (ZOD) in Paper A. ZOD is a large-scale multimodal dataset collected across diverse geographies, weather conditions, and road types throughout Europe, effectively addressing key shortcomings of existing academic datasets.

We then turn to the challenge of learning from this data. First, we develop a method that bypasses the need for intricate image signal processing pipelines and instead learns to detect objects directly from RAW image data in a supervised setting (Paper B). This reduces the reliance on hand-crafted preprocessing but still requires annotations. Although sensor data is typically abundant in the autonomous driving setting, such annotations become prohibitively expensive at scale. To overcome this bottleneck, we propose GASP (Paper C), a self-supervised method that captures structured 4D representations by jointly modeling geometry, semantics, and dynamics solely from sensor data.

Once models are trained, they must undergo rigorous validation. Yet existing evaluation methods often fall short in realism, scalability, or both. To remedy this, we introduce NeuroNCAP (Paper D), a neural rendering-based closed-loop simulation framework that enables safety-critical testing in photorealistic environments. Building on this, we present R3D2 (Paper E), a generative method for realistic insertion of non-native 3D assets into such environments, further enhancing the scalability and diversity of safety-critical testing.

Together, these contributions provide a scalable set of tools for training and validating autonomous driving systems, supporting progress both in mastering the nominal 99% of everyday driving and in validating behavior in the critical 1% of rare, safety-critical situations.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. p. 65
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2478
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:liu:diva-219102 (URN)10.3384/9789181182453 (DOI)9789181182446 (ISBN)9789181182453 (ISBN)
Public defence
2025-11-28, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: This thesis work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation, and by Zenseact AB through their industrial PhD program. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, providedby the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2025-10-27 Created: 2025-10-27 Last updated: 2025-10-27Bibliographically approved

Open Access in DiVA

fulltext from ArXiV CC BY(18683 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 18683 kBChecksum SHA-512
7b3e62303c0bc76388f0136ce0d0d4a52c9e13fbef859c127e903d913a0d8ba6d4470840dfbe1767db986248e3b14c9c3f47e3efd423e37fad8f7a19331eff29
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Ljungbergh, WilliamFelsberg, Michael

Search in DiVA

By author/editor
Ljungbergh, WilliamFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
Artificial IntelligenceVehicle and Aerospace Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 680 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf