liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Registration Loss Learning for Deep Probabilistic Point Set Registration
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-5698-5983
2020 (English)In: 2020 International Conference on 3D Vision (3DV), IEEE, 2020, p. 563-572Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic methods for point set registration have interesting theoretical properties, such as linear complexity in the number of used points, and they easily generalize to joint registration of multiple point sets. In this work, we improve their recognition performance to match state of the art. This is done by incorporating learned features, by adding a von Mises-Fisher feature model in each mixture component, and by using learned attention weights. We learn these jointly using a registration loss learning strategy (RLL) that directly uses the registration error as a loss, by back-propagating through the registration iterations. This is possible as the probabilistic registration is fully differentiable, and the result is a learning framework that is truly end-to-end. We perform extensive experiments on the 3DMatch and Kitti datasets. The experiments demonstrate that our approach benefits significantly from the integration of the learned features and our learning strategy, outperforming the state-of-the-art on Kitti. Code is available at https://github.com/felja633/RLLReg.

Place, publisher, year, edition, pages
IEEE, 2020. p. 563-572
Series
International Conference on 3D Vision, ISSN 2378-3826
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:liu:diva-173539DOI: 10.1109/3DV50981.2020.00066ISI: 000653085200057ISBN: 978-1-7281-8128-8 (electronic)ISBN: 978-1-7281-8129-5 (print)OAI: oai:DiVA.org:liu-173539DiVA, id: diva2:1530341
Conference
International Virtual Conference on 3D Vision, November 25-28, 2020
Note

Funding Agencies|ELLIIT Excellence Center; Vinnova through the Visual Sweden networkVinnova [2019-02261]

Available from: 2021-02-22 Created: 2021-02-22 Last updated: 2022-10-06Bibliographically approved
In thesis
1. Learning Representations for Segmentation and Registration
Open this publication in new window or tab >>Learning Representations for Segmentation and Registration
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In computer vision, the aim is to model and extract high-level information from visual sensor measurements such as images, videos and 3D points. Since visual data is often high-dimensional, noisy and irregular, achieving robust data modeling is challenging. This thesis presents works that address challenges within a number of different computer vision problems. 

First, the thesis addresses the problem of phase unwrapping for multi-frequency amplitude modulated time-of-flight (ToF) ranging. ToF is used in depth cameras, which have many applications in 3D reconstruction and gesture recognition. While amplitude modulation in time-of-flight ranging can provide accurate measurements for the depth, it also causes depth ambiguities. This thesis presents a method to resolve the ambiguities by estimating the likelihoods of different hypotheses for the depth values. This is achieved by performing kernel density estimation over the hypotheses in a spatial neighborhood of each pixel in the depth image. The depth hypothesis with the highest estimated likelihood can then be selected as the output depth. This approach yields improvements in the quality of the depth images and extends the effective range in both indoor and outdoor environments. 

Next, point set registration is investigated, which is the problem of aligning point sets from overlapping depth images or 3D models. Robust registration is fundamental to many vision tasks, such as multi-view 3D reconstruction and object pose estimation for robotics. The thesis presents a method for handling density variations in the measured point sets. This is achieved by modeling a latent distribution representing the underlying structure of the scene. Both the model of the scene and the registration parameters are inferred in an Expectation-Maximization based framework. Secondly, the thesis introduces a method for integrating features from deep neural networks into the registration model. It is shown that the deep features improve registration performance in terms of accuracy and robustness. Additionally, improved feature representations are generated by training the deep neural network end-to-end by minimizing registration errors produced by our registration model. 

Further, an approach for 3D point set segmentation is presented. As scene models are often represented using 3D point measurements, segmentation of these is important for general scene understanding. Learning models for segmentation requires a significant amount of annotated data, which is expensive and time-consuming to acquire. The approach presented in the thesis circumvents this by projecting the points into virtual camera views and render 2D images. The method can then exploit accurate convolutional neural networks for image segmentation and map the segmentation predictions back to the 3D points. This also allows for transferring learning using available annotated image data, thereby reducing the need for 3D annotations. 

Finally, the thesis explores the problem of video object segmentation (VOS), where the task is to track and segment target objects in each frame of a video sequence. Accurate VOS requires a robust model of the target that can adapt to different scenarios and objects. This needs to be achieved using only a single labeled reference frame as training data for each video sequence. To address the challenges in VOS, the thesis introduces a parametric target model, optimized to predict a target label derived from the mask annotation. The target model is integrated into a deep neural network, where its predictions guide a decoder module to produce target segmentation masks. The deep network is trained on labeled video data to output accurate segmentation masks for each frame. Further, it is shown that by training the entire network model in an end-to-end manner, it can learn a representation of the target that provides increased segmentation accuracy. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2021. p. 75
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2151
Keywords
Computer Vision, point set registration, video object segmentation, time-of-flight, point set segmentation, deep learning, expectation maximization
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-176054 (URN)10.3384/diss.diva-176054 (DOI)9789179296230 (ISBN)
Public defence
2021-08-27, Ada Lovelace, B-building, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2021-07-20 Created: 2021-06-02 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(2525 kB)321 downloads
File information
File name FULLTEXT01.pdfFile size 2525 kBChecksum SHA-512
bab6e03a23b2b1ffc732ed1550ddbdb818b11d25368c696950bb24a38100fd9013f77c761eeaf488c11a4813378909fd3bbb54dd27cff9974c2125b6a5136e2e
Type fulltextMimetype application/pdf

Other links

Publisher's full textLink to preprint version i arxiv,org

Authority records

Järemo-Lawin, FelixForssén, Per-Erik

Search in DiVA

By author/editor
Järemo-Lawin, FelixForssén, Per-Erik
By organisation
Computer VisionFaculty of Science & Engineering
Other Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 329 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 225 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf