liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Raw or Cooked?: Object Detection on RAW Images
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Gothenburg, Sweden.ORCID iD: 0000-0002-0194-6346
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Gothenburg, Sweden.ORCID iD: 0000-0003-2553-3367
Zenseact, Gothenburg, Sweden.ORCID iD: 0000-0002-9203-558X
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6096-3648
2023 (English)In: Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, Vol. 13885, p. 374-385Conference paper, Published paper (Refereed)
Abstract [en]

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

Place, publisher, year, edition, pages
Springer, 2023. Vol. 13885, p. 374-385
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13885
Keywords [en]
Computer Vision, Object detection, RAW images, Image Signal Processing
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-199000DOI: 10.1007/978-3-031-31435-3_25Scopus ID: 2-s2.0-85161382246ISBN: 9783031314346 (print)ISBN: 9783031314353 (electronic)OAI: oai:DiVA.org:liu-199000DiVA, id: diva2:1809798
Conference
Scandinavian Conference on Image Analysis, Sirkka, Finland, April 18–21, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2025-10-27Bibliographically approved
In thesis
1. On the Road to Safe Autonomous Driving via Data, Learning, and Validation
Open this publication in new window or tab >>On the Road to Safe Autonomous Driving via Data, Learning, and Validation
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous driving systems hold the promise of safer and more efficient transportation, with the potential to fundamentally reshape what everyday mobility looks like. However, to realize these promises, such systems must perform reliably in both routine driving and in rare, safety-critical situations. To this end, this thesis addresses three core aspects of autonomous driving development: data, learning, and validation.

First, we tackle the fundamental need for high-quality data by introducing the Zenseact Open Dataset (ZOD) in Paper A. ZOD is a large-scale multimodal dataset collected across diverse geographies, weather conditions, and road types throughout Europe, effectively addressing key shortcomings of existing academic datasets.

We then turn to the challenge of learning from this data. First, we develop a method that bypasses the need for intricate image signal processing pipelines and instead learns to detect objects directly from RAW image data in a supervised setting (Paper B). This reduces the reliance on hand-crafted preprocessing but still requires annotations. Although sensor data is typically abundant in the autonomous driving setting, such annotations become prohibitively expensive at scale. To overcome this bottleneck, we propose GASP (Paper C), a self-supervised method that captures structured 4D representations by jointly modeling geometry, semantics, and dynamics solely from sensor data.

Once models are trained, they must undergo rigorous validation. Yet existing evaluation methods often fall short in realism, scalability, or both. To remedy this, we introduce NeuroNCAP (Paper D), a neural rendering-based closed-loop simulation framework that enables safety-critical testing in photorealistic environments. Building on this, we present R3D2 (Paper E), a generative method for realistic insertion of non-native 3D assets into such environments, further enhancing the scalability and diversity of safety-critical testing.

Together, these contributions provide a scalable set of tools for training and validating autonomous driving systems, supporting progress both in mastering the nominal 99% of everyday driving and in validating behavior in the critical 1% of rare, safety-critical situations.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. p. 65
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2478
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:liu:diva-219102 (URN)10.3384/9789181182453 (DOI)9789181182446 (ISBN)9789181182453 (ISBN)
Public defence
2025-11-28, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: This thesis work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation, and by Zenseact AB through their industrial PhD program. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, providedby the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2025-10-27 Created: 2025-10-27 Last updated: 2025-10-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ljungbergh, WilliamJohnander, JoakimFelsberg, Michael

Search in DiVA

By author/editor
Ljungbergh, WilliamJohnander, JoakimPetersson, ChristofferFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 196 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf