liu.seSearch for publications in DiVA
3435363738394037 of 112
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On the Road to Safe Autonomous Driving via Data, Learning, and Validation
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-0194-6346
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous driving systems hold the promise of safer and more efficient transportation, with the potential to fundamentally reshape what everyday mobility looks like. However, to realize these promises, such systems must perform reliably in both routine driving and in rare, safety-critical situations. To this end, this thesis addresses three core aspects of autonomous driving development: data, learning, and validation.

First, we tackle the fundamental need for high-quality data by introducing the Zenseact Open Dataset (ZOD) in Paper A. ZOD is a large-scale multimodal dataset collected across diverse geographies, weather conditions, and road types throughout Europe, effectively addressing key shortcomings of existing academic datasets.

We then turn to the challenge of learning from this data. First, we develop a method that bypasses the need for intricate image signal processing pipelines and instead learns to detect objects directly from RAW image data in a supervised setting (Paper B). This reduces the reliance on hand-crafted preprocessing but still requires annotations. Although sensor data is typically abundant in the autonomous driving setting, such annotations become prohibitively expensive at scale. To overcome this bottleneck, we propose GASP (Paper C), a self-supervised method that captures structured 4D representations by jointly modeling geometry, semantics, and dynamics solely from sensor data.

Once models are trained, they must undergo rigorous validation. Yet existing evaluation methods often fall short in realism, scalability, or both. To remedy this, we introduce NeuroNCAP (Paper D), a neural rendering-based closed-loop simulation framework that enables safety-critical testing in photorealistic environments. Building on this, we present R3D2 (Paper E), a generative method for realistic insertion of non-native 3D assets into such environments, further enhancing the scalability and diversity of safety-critical testing.

Together, these contributions provide a scalable set of tools for training and validating autonomous driving systems, supporting progress both in mastering the nominal 99% of everyday driving and in validating behavior in the critical 1% of rare, safety-critical situations.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. , p. 65
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2478
National Category
Computer Vision and Learning Systems
Identifiers
URN: urn:nbn:se:liu:diva-219102DOI: 10.3384/9789181182453ISBN: 9789181182446 (print)ISBN: 9789181182453 (electronic)OAI: oai:DiVA.org:liu-219102DiVA, id: diva2:2009185
Public defence
2025-11-28, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: This thesis work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation, and by Zenseact AB through their industrial PhD program. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, providedby the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2025-10-27 Created: 2025-10-27 Last updated: 2025-10-27Bibliographically approved
List of papers
1. Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving
Open this publication in new window or tab >>Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving
Show others...
2023 (English)In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 20121-20131Conference paper, Published paper (Refereed)
Abstract [en]

Existing datasets for autonomous driving (AD) often lack diversity and long-range capabilities, focusing instead on 360° perception and temporal reasoning. To address this gap, we introduce Zenseact Open Dataset (ZOD), a large- scale and diverse multimodal dataset collected over two years in various European countries, covering an area 9×that of existing datasets. ZOD boasts the highest range and resolution sensors among comparable datasets, coupled with detailed keyframe annotations for 2D and 3D objects (up to 245m), road instance/semantic segmentation, traffic sign recognition, and road classification. We believe that this unique combination will facilitate breakthroughs in long-range perception and multi-task learning. The dataset is composed of Frames, Sequences, and Drives, designed to encompass both data diversity and support for spatio-temporal learning, sensor fusion, localization, and mapping. Frames consist of 100k curated camera images with two seconds of other supporting sensor data, while the 1473 Sequences and 29 Drives include the entire sensor suite for 20 seconds and a few minutes, respectively. ZOD is the only large-scale AD dataset released under a permissive license, allowing for both research and commercial use. More information, and an extensive devkit, can be found at zod.zenseact.com.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
International Conference on Computer Vision (ICCV), ISSN 1550-5499, E-ISSN 2380-7504
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-209825 (URN)10.1109/iccv51070.2023.01846 (DOI)9798350307184 (ISBN)9798350307191 (ISBN)
Conference
2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 01-06 October, 2023
Available from: 2024-11-14 Created: 2024-11-14 Last updated: 2025-10-27
2. Raw or Cooked?: Object Detection on RAW Images
Open this publication in new window or tab >>Raw or Cooked?: Object Detection on RAW Images
2023 (English)In: Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, Vol. 13885, p. 374-385Conference paper, Published paper (Refereed)
Abstract [en]

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13885
Keywords
Computer Vision, Object detection, RAW images, Image Signal Processing
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-199000 (URN)10.1007/978-3-031-31435-3_25 (DOI)2-s2.0-85161382246 (Scopus ID)9783031314346 (ISBN)9783031314353 (ISBN)
Conference
Scandinavian Conference on Image Analysis, Sirkka, Finland, April 18–21, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2025-10-27Bibliographically approved
3. GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving
Open this publication in new window or tab >>GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time. In this direction, we propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime, (1) general occupancy, capturing the evolving structure of the 3D scene; (2) ego occupancy, modeling the ego vehicle path through the environment; and (3) distilled high-level features from a vision foundation model. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction. Our results demonstrate that continuous 4D geometric and semantic occupancy prediction provides a scalable and effective pre-training paradigm for autonomous driving. For code and additional visualizations, see https://research.zenseact.com/publications/gasp/

National Category
Computer graphics and computer vision Artificial Intelligence
Identifiers
urn:nbn:se:liu:diva-218890 (URN)10.48550/arXiv.2503.15672 (DOI)
Available from: 2025-10-17 Created: 2025-10-17 Last updated: 2025-10-27
4. NeuroNCAP: Photorealistic Closed-Loop Safety Testing for Autonomous Driving
Open this publication in new window or tab >>NeuroNCAP: Photorealistic Closed-Loop Safety Testing for Autonomous Driving
Show others...
2024 (English)In: COMPUTER VISION - ECCV 2024, PT XXX, SPRINGER INTERNATIONAL PUBLISHING AG , 2024, Vol. 15088, p. 161-177Conference paper, Published paper (Refereed)
Abstract [en]

We present a versatile NeRF-based simulator for testing autonomous driving (AD) software systems, designed with a focus on sensor-realistic closed-loop evaluation and the creation of safety-critical scenarios. The simulator learns from sequences of real-world driving sensor data and enables reconfigurations and renderings of new, unseen scenarios. In this work, we use our simulator to test the responses of AD models to safety-critical scenarios inspired by the European New Car Assessment Programme (Euro NCAP). Our evaluation reveals that, while state-of-the-art end-to-end planners excel in nominal driving scenarios in an open-loop setting, they exhibit critical flaws when navigating our safety-critical scenarios in a closed-loop setting. This highlights the need for advancements in the safety and real-world usability of end-to-end planners. By publicly releasing our simulator and scenarios as an easy-to-run evaluation suite, we invite the research community to explore, refine, and validate their AD models in controlled, yet highly configurable and challenging sensor-realistic environments.

Place, publisher, year, edition, pages
SPRINGER INTERNATIONAL PUBLISHING AG, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Autonomous driving; Closed-loop simulation; Trajectory planning; Neural rendering
National Category
Embedded Systems
Identifiers
urn:nbn:se:liu:diva-210297 (URN)10.1007/978-3-031-73404-5_10 (DOI)001352847300010 ()2-s2.0-85208599732 (Scopus ID)9783031734038 (ISBN)9783031734045 (ISBN)
Conference
18th European Conference on Computer Vision (ECCV), Milan, ITALY, sep 29-oct 04, 2024
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]

Available from: 2024-12-09 Created: 2024-12-09 Last updated: 2025-10-27
5. R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
Open this publication in new window or tab >>R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Validating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation. To promote further research in scalable and realistic AD simulation, we will release our dataset and code, see https://research.zenseact.com/publications/R3D2/

National Category
Artificial Intelligence Vehicle and Aerospace Engineering
Identifiers
urn:nbn:se:liu:diva-218892 (URN)10.48550/arXiv.2506.07826 (DOI)
Available from: 2025-10-17 Created: 2025-10-17 Last updated: 2025-10-27

Open Access in DiVA

fulltext(9359 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 9359 kBChecksum SHA-512
4506df29f076a0118cd615e6af4cb9bdf54093759c5c61f843552ba39713e89cb29619f235d9e155225dbd07f2bddf58229d862a13bd197ef986c356d70bb990
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Ljungbergh, William

Search in DiVA

By author/editor
Ljungbergh, William
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Learning Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 269 hits
3435363738394037 of 112
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf