liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-1019-8634
Chalmers Univ Technol, Sweden.
Chinese Univ Hong Kong, Peoples R China; Texas A&M Univ, TX USA.
2024 (Engelska)Ingår i: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, IEEE COMPUTER SOC , 2024, s. 4245-4253Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training. Second, we address issues related to data augmentation. In particular, the DeDoDe detector is sensitive to large rotations. We fix this by including 90-degree rotations as well as horizontal flips. Finally, the decoupled nature of the DeDoDe detector makes evaluation of downstream usefulness problematic. We fix this by matching the keypoints with a pretrained dense matcher (RoMa) and evaluating two-view pose estimates. We find that the original long training is detrimental to performance, and therefore propose a much shorter training schedule. We integrate all these improvements into our proposed detector DeDoDe v2 and evaluate it with the original DeDoDe descriptor on the MegaDepth-1500 and IMC2022 benchmarks. Our proposed detector significantly increases pose estimation results, notably from 75.9 to 78.3 mAA on the IMC2022 challenge. Code and weights are available at github.com/Parskatt/DeDoDe.

Ort, förlag, år, upplaga, sidor
IEEE COMPUTER SOC , 2024. s. 4245-4253
Serie
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, ISSN 2160-7508, E-ISSN 2160-7516
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
URN: urn:nbn:se:liu:diva-212421DOI: 10.1109/CVPRW63382.2024.00428ISI: 001327781704041Scopus ID: 2-s2.0-85198087533ISBN: 9798350365474 (digital)ISBN: 9798350365481 (tryckt)OAI: oai:DiVA.org:liu-212421DiVA, id: diva2:1945902
Konferens
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, jun 16-22, 2024
Anmärkning

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; strategic research environment ELLIIT - Swedish government; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Tillgänglig från: 2025-03-19 Skapad: 2025-03-19 Senast uppdaterad: 2025-09-11
Ingår i avhandling
1. Towards the Next Generation of 3D Reconstruction
Öppna denna publikation i ny flik eller fönster >>Towards the Next Generation of 3D Reconstruction
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Humans perceive our visual surroundings through the projection of light rays through our pupils and onto the retina. Aided by motion, we gain an understanding of our environment, as well as our location within it. The goal of image-based 3D reconstruction is to imbue machines with similar capabilities. The most prominent paradigm for image-based 3D reconstruction is called Structure-from-Motion (SfM). Traditionally, SfM has been approached through handcrafted algorithms, which are brittle when assumptions do not hold. Humans, on the other hand, understand their environment intuitively and show remarkable robustness in their ability to localize themselves in, and map the world. 

The main purpose of this thesis is the development of a set of methods which strives toward the next generation of SfM, imbued with intelligence and robustness. In particular, we propose a set of methods dealing with 2D: learning of keypoint detectors, features, and dense feature matching, and 3D: threshold-robust relative pose estimation, and registration of SfM maps. 

First, we develop models to detect keypoints, producing a set of 2D image coordinates, and models to describe the image, producing features. One of our key contributions is decoupling these tasks, which have typically been learned jointly, into distinct objectives, resulting in major gains in performance, as well as increased modularity. Paper A introduces this decoupled framework, and Paper B further develops the keypoint objective. In Paper C we revisit the keypoint objective from an entirely self-supervised reinforcement learning perspective, yielding several insights, and further gains in performance. 

We further develop methods for dense feature matching, i.e., matching every pixel between two images. In Paper D we propose the first dense feature matcher capable of outperforming sparse matching for relative pose estimation. This is significant, as previous work had generally indicated that the sparse or semi-dense paradigm was preferable. In Paper E we greatly improve on almost all components of the method of Paper D, resulting in an extremely robust dense matcher, capable of matching almost any pair of images. 

We lift our eyes from the 2D image plane into 3D, and investigate relative pose estimation and 3D registration of SfM maps. Relative pose estimation is a difficult task, as non-robust estimation fails in the presence of outliers. Random Sample Consensus (RANSAC), which is the goldstandard robust estimation method, requires setting an outlier threshold, which is non-trivial to set, and poor choices result in significantly worse performance. In Paper F, we develop an algorithm to automatically estimate this threshold from an initial guess that is less biased than previous approaches, leading to robust performance. 

Finally, we investigate registering SfM maps together. This is particularly interesting in distributed settings where, e.g., robots need to localize with respect to each other’s reference frames in order to collaborate. However, in this setting, using image-based localization approaches comes with downsides. In particular, computational complexity, compatibility issues, and privacy concerns severely limit the potential of such systems to be deployed. In Paper G we propose a new paradigm for registering SfM maps through point cloud registration, circumventing the above limitations. Finding that existing registration models trained on 3D scan data fail on this task, we develop a dataset for SfM registration. Training on our proposed dataset greatly improves performance on the task, showing the potential of the proposed paradigm.

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2025. s. 121
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2464
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:liu:diva-217639 (URN)10.3384/9789181181906 (DOI)9789181181890 (ISBN)9789181181906 (ISBN)
Disputation
2025-10-08, Zero, Hus Zenit, Campus Valla, Linköping, 09:15 (Engelska)
Opponent
Handledare
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsWallenberg AI, Autonomous Systems and Software Program (WASP)
Tillgänglig från: 2025-09-11 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-19

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Edstedt, Johan
Av organisationen
DatorseendeTekniska fakulteten
Datorgrafik och datorseende

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 123 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf