liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Wandt, Bastian
Publications (8 of 8) Show all publications
Xiong, Z., Jonnarth, A., Eldesokey, A., Johnander, J., Wandt, B. & Forssén, P.-E. (2024). Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): . Paper presented at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17-18 June 2024 (pp. 3471-3480). IEEE, abs/1803.04765
Open this publication in new window or tab >>Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks
Show others...
2024 (English)In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE , 2024, Vol. abs/1803.04765, p. 3471-3480Conference paper, Published paper (Refereed)
Abstract [en]

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein – a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.

Place, publisher, year, edition, pages
IEEE, 2024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-208088 (URN)10.1109/cvprw63382.2024.00351 (DOI)9798350365474 (ISBN)9798350365481 (ISBN)
Conference
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17-18 June 2024
Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2025-02-07Bibliographically approved
Hägerlind, J., Hentati-Sundberg, J. & Wandt, B. (2024). Temporally-consistent 3D Reconstruction of Birds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): CV4Animals: Computer Vision for Animal Behavior. Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17-21, 2024.
Open this publication in new window or tab >>Temporally-consistent 3D Reconstruction of Birds
2024 (English)In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): CV4Animals: Computer Vision for Animal Behavior, 2024Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset. 

Keywords
pose estimation, 3D reconstruction, articulated mesh, bird, common murre, temporal
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-208815 (URN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17-21, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

The computations were enabled by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2024-10-28 Created: 2024-10-28 Last updated: 2025-02-07Bibliographically approved
Zwölfer, M., Heinrich, D., Wandt, B., Rhodin, H., Spörri, J. & Nachbauer, W. (2023). A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing. Scientific Reports, 13(1), Article ID 21465.
Open this publication in new window or tab >>A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing
Show others...
2023 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 13, no 1, article id 21465Article in journal (Refereed) Published
Abstract [en]

For most applications, 2D keypoint detection works well and offers a simple and fast tool to analyse human movements. However, there remain many situations where even the best state-of-the-art algorithms reach their limits and fail to detect human keypoints correctly. Such situations may occur especially when individual body parts are occluded, twisted, or when the whole person is flipped. Especially when analysing injuries in alpine ski racing, such twisted and rotated body positions occur frequently. To improve the detection of keypoints for this application, we developed a novel method that refines keypoint estimates by rotating the input videos. We select the best rotation for every frame with a graph-based global solver. Thereby, we improve keypoint detection of an arbitrary pose estimation algorithm, in particular for 'hard' keypoints. In the current proof-of-concept study, we show that our approach outperforms standard keypoint detection results in all categories and in all metrics, in injury-related out-of-balance and fall situations by a large margin as well as previous methods, in performance and robustness. The Injury Ski II dataset was made publicly available, aiming to facilitate the investigation of sports accidents based on computer vision in the future.

Place, publisher, year, edition, pages
Nature Publishing Group, 2023
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-202769 (URN)10.1038/s41598-023-47875-2 (DOI)001253614500009 ()38052814 (PubMedID)2-s2.0-85178850878 (Scopus ID)
Available from: 2024-04-19 Created: 2024-04-19 Last updated: 2025-03-28
Song, C., Zhang, Y., Peng, W., Mohaghegh, P., Wandt, B. & Rhodin, H. (2023). AudioViewer: Learning to Visualize Sounds. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV): . Paper presented at 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 02-07 January, 2023 (pp. 2205-2215). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>AudioViewer: Learning to Visualize Sounds
Show others...
2023 (English)In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 2205-2215Conference paper, Published paper (Refereed)
Abstract [en]

A long-standing goal in the field of sensory substitution is enabling sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, with-out labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and numbers are well suited for visualizing high-dimensional audio features that can be parsed by humans to match and distinguish between sounds and words. Project website: https://chunjinsong.github.io/audioviewer

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE Workshop on Applications of Computer Vision (WACV), ISSN 2472-6737, E-ISSN 2642-9381
National Category
Natural Language Processing
Identifiers
urn:nbn:se:liu:diva-209826 (URN)10.1109/wacv56688.2023.00224 (DOI)9781665493468 (ISBN)9781665493475 (ISBN)
Conference
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 02-07 January, 2023
Available from: 2024-11-14 Created: 2024-11-14 Last updated: 2025-02-07
Holmquist, K. & Wandt, B. (2023). Diffpose: Multi-hypothesis human pose estimation using diffusion models. In: : . Paper presented at ICCV 2023, Paris, France, October 4-6, 2023..
Open this publication in new window or tab >>Diffpose: Multi-hypothesis human pose estimation using diffusion models
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Traditionally, monocular 3D human pose estimation employs a machine learning model to predict the most likely 3D pose for a given input image. However, a single image can be highly ambiguous and induces multiple plausible solutions for the 2D-3D lifting step, which results in overly confident 3D pose predictors. To this end, we propose DiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. Compared to similar approaches, our diffusion model is straightforward and avoids intensive hyperparameter tuning, complex network structures, mode collapse, and unstable training. Moreover, we tackle the problem of over-simplification of the intermediate representation of the common two-step approaches which first estimate a distribution of 2D joint locations via joint-wise heatmaps and consecutively use their maximum argument for the 3D pose estimation step. Since such a simplification of the heatmaps removes valid information about possibly correct, though labeled unlikely, joint locations, we propose to represent the heatmaps as a set of 2D joint candidate samples. To extract information about the original distribution from these samples, we introduce our embedding transformer which conditions the diffusion model. Experimentally, we show that DiffPose improves upon the state of the art for multi-hypothesis pose estimation by 3-5% for simple poses and outperforms it by a large margin for highly ambiguous poses.

National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-198612 (URN)
Conference
ICCV 2023, Paris, France, October 4-6, 2023.
Available from: 2023-10-20 Created: 2023-10-20 Last updated: 2025-02-07Bibliographically approved
He, X., Wandt, B. & Rhodin, H. (2023). LatentKeypointGAN: Controlling Images via Latent Keypoints. In: 2023 20th Conference on Robots and Vision (CRV): . Paper presented at 2023 20th Conference on Robots and Vision (CRV), Montreal, QC, Canada, 06-08 June 2023. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>LatentKeypointGAN: Controlling Images via Latent Keypoints
2023 (English)In: 2023 20th Conference on Robots and Vision (CRV), Institute of Electrical and Electronics Engineers (IEEE), 2023Conference paper, Published paper (Refereed)
Abstract [en]

Generative adversarial networks (GANs) can now generate photorealistic images. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN internally conditioned on a set of keypoints and associated appearance embeddings providing control of the position and style of the generated objects and their respective parts. A major difficulty that we address is disentangling the image into spatial and appearance factors with little domain knowledge and supervision signals. We demonstrate in a user study and quantitative experiments that LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images by re-positioning, adding, removing, and exchanging keypoint embeddings, such as generating portraits by combining the eyes, and mouth from different images. Notably, our method does not require labels as it is self-supervised and thereby applies to diverse application domains, such as editing portraits, indoor rooms, and full-body human poses. In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
Canadian Conference on Computer and Robot Vision, CRV
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-209827 (URN)10.1109/crv60082.2023.00009 (DOI)9798350341393 (ISBN)9798350341409 (ISBN)
Conference
2023 20th Conference on Robots and Vision (CRV), Montreal, QC, Canada, 06-08 June 2023
Available from: 2024-11-14 Created: 2024-11-14 Last updated: 2025-02-07
Wandt, B., Little, J. J. & Rhodin, H. (2022). ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18-24 June, 2022 (pp. 6635-6645). Institute of Electrical and Electronics Engineers (IEEE), 1
Open this publication in new window or tab >>ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses
2022 (English)In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2022, Vol. 1, p. 6635-6645Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
Conference on Computer Vision and Pattern Recognition (CVPR), ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-209387 (URN)10.1109/CVPR52688.2022.00652 (DOI)9781665469463 (ISBN)9781665469470 (ISBN)
Conference
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18-24 June, 2022
Available from: 2024-11-11 Created: 2024-11-11 Last updated: 2025-02-07
Wandt, B., Rudolph, M., Zell, P., Rhodin, H. & Rosenhahn, B. (2021). Canonpose: Self-supervised monocular 3d human pose estimation in the wild. In: : . Paper presented at CVPR (pp. 13294-13304).
Open this publication in new window or tab >>Canonpose: Self-supervised monocular 3d human pose estimation in the wild
Show others...
2021 (English)Conference paper, Published paper (Refereed)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-209379 (URN)
Conference
CVPR
Available from: 2024-11-11 Created: 2024-11-11 Last updated: 2025-02-07Bibliographically approved
Organisations

Search in DiVA

Show all publications