liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Felsberg, Michael, ProfessorORCID iD iconorcid.org/0000-0002-6096-3648
Alternative names
Publications (10 of 218) Show all publications
Athanasiadis, I., Lindsten, F. & Felsberg, M. (2025). Prior Learning in Introspective VAEs. Transactions on Machine Learning Research (06), 1-41
Open this publication in new window or tab >>Prior Learning in Introspective VAEs
2025 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856, no 06, p. 1-41Article in journal (Refereed) Published
Abstract [en]

Variational Autoencoders (VAEs) are a popular framework for unsupervised learning anddata generation. A plethora of methods have been proposed focusing on improving VAEs,with the incorporation of adversarial objectives and the integration of prior learning mechanismsbeing prominent directions. When it comes to the former, an indicative instance is therecently introduced family of Introspective VAEs aiming at ensuring that a low likelihood isassigned to unrealistic samples. In this study, we focus on the Soft-IntroVAE (S-IntroVAE),one of only two members of the Introspective VAE family, the other being the originalIntroVAE. We select S-IntroVAE for its state-of-the-art status and its training stability.In particular, we investigate the implication of incorporating a multimodal and trainableprior into this S-IntroVAE. Namely, we formulate the prior as a third player and show thatwhen trained in cooperation with the decoder constitutes an effective way for prior learning,which shares the Nash Equilibrium with the vanilla S-IntroVAE. Furthermore, basedon a modified formulation of the optimal ELBO in S-IntroVAE, we develop theoreticallymotivated regularizations, namely (i) adaptive variance clipping to stabilize training whenlearning the prior and (ii) responsibility regularization to discourage the formation of inactiveprior modes. Finally, we perform a series of targeted experiments on a 2D densityestimation benchmark and in an image generation setting comprised of the (F)-MNIST andCIFAR-10 datasets demonstrating the effect of prior learning in S-IntroVAE in generationand representation learning.

Keywords
probaiblity theory and statistics, computer sciences
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-214678 (URN)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-06-12 Created: 2025-06-12 Last updated: 2025-06-12Bibliographically approved
Edstedt, J., Bökman, G., Wadenbäck, M. & Felsberg, M. (2024). DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching. In: 2024 International Conference on 3D Vision (3DV): . Paper presented at International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Davos, Switzerland, 18-21 March, 2024.. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching
2024 (English)In: 2024 International Conference on 3D Vision (3DV), Institute of Electrical and Electronics Engineers (IEEE), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected in each view of a scene. Crucially, the detected points need to be consistent between views, i.e., correspond to the same 3D point in the scene. One of the main challenges with keypoint detection is the formulation of the learning objective. Previous learning-based methods typically jointly learn descriptors with keypoints, and treat the keypoint detection as a binary classification task on mutual nearest neighbours. However, basing keypoint detection on descriptor nearest neighbours is a proxy task, which is not guaranteed to produce 3D-consistent keypoints. Furthermore, this ties the keypoints to a specific descriptor, complicating downstream usage. In this work, we instead learn keypoints directly from 3D consistency. To this end, we train the detector to detect tracks from large-scale SfM. As these points are often overly sparse, we derive a semi-supervised two-view detection objective to expand this set to a desired number of detections. To train a descriptor, we maximize the mutual nearest neighbour objective over the keypoints with a separate network. Results show that our approach, DeDoDe, achieves significant gains on multiple geometry benchmarks. Code is provided at http://github.com/Parskatt/DeDoDegithub.com/Parskatt/DeDoDe.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
2024 International Conference on 3D Vision (3DV), ISSN 2378-3826, E-ISSN 2475-7888
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-204892 (URN)10.1109/3dv62453.2024.00035 (DOI)001250581700028 ()9798350362459 (ISBN)9798350362466 (ISBN)
Conference
International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Davos, Switzerland, 18-21 March, 2024.
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; strategic research environment ELLIIT - Swedish government; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2024-06-17 Created: 2024-06-17 Last updated: 2025-02-07
Sanchez Aimar, E., Helgesen, N., Xu, Y., Kuhlmann, M. & Felsberg, M. (2024). Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration. In: Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol (Ed.), Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIV. Paper presented at 18th European Conference, Milan, Italy, September 29–October 4, 2024 (pp. 307-327). Springer Nature Switzerland, 15112
Open this publication in new window or tab >>Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Show others...
2024 (English)In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIV / [ed] Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol, Springer Nature Switzerland , 2024, Vol. 15112, p. 307-327Conference paper, Published paper (Refereed)
Abstract [en]

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.

Place, publisher, year, edition, pages
Springer Nature Switzerland, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15112
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-209223 (URN)10.1007/978-3-031-72949-2_18 (DOI)001352860600018 ()2-s2.0-85208545165 (Scopus ID)9783031729485 (ISBN)9783031729492 (ISBN)
Conference
18th European Conference, Milan, Italy, September 29–October 4, 2024
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2024-11-06 Created: 2024-11-06 Last updated: 2024-12-17
Jonnarth, A., Zhang, Y. & Felsberg, M. (2024). High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV): . Paper presented at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, jan 3-8, 2024 (pp. 999-1008). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
2024 (English)In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 999-1008Conference paper, Published paper (Refereed)
Abstract [en]

Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset further show that our proposed add-on is well-suited for large-scale settings. Our code implementation is available at https://github.com/arvijj/hfpl.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
weakly supervised, semantic segmentation, importance sampling, feature similarity, class activation maps
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-202446 (URN)10.1109/WACV57701.2024.00105 (DOI)001222964601011 ()2-s2.0-85191946457 (Scopus ID)
Conference
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, jan 3-8, 2024
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg (KAW) Foundation; Swedish Research Council [2022-06725]

Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2025-03-20Bibliographically approved
Jonnarth, A., Zhao, J. & Felsberg, M. (2024). Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning. In: Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, Felix Berkenkamp (Ed.), Proceedings of the 41st International Conference on Machine Learning: . Paper presented at International Conference on Machine Learning, 21-27 July 2024, Vienna, Austria (pp. 22491-22508). PMLR
Open this publication in new window or tab >>Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
2024 (English)In: Proceedings of the 41st International Conference on Machine Learning / [ed] Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, Felix Berkenkamp, PMLR , 2024, p. 22491-22508Conference paper, Published paper (Refereed)
Abstract [en]

Coverage path planning (CPP) is the problem of finding a path that covers the entire free space of a confined area, with applications ranging from robotic lawn mowing to search-and-rescue. When the environment is unknown, the path needs to be planned online while mapping the environment, which cannot be addressed by offline planning methods that do not allow for a flexible path space. We investigate how suitable reinforcement learning is for this challenging problem, and analyze the involved components required to efficiently learn coverage paths, such as action space, input feature representation, neural network architecture, and reward function. We propose a computationally feasible egocentric map representation based on frontiers, and a novel reward term based on total variation to promote complete coverage. Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 235
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-207087 (URN)
Conference
International Conference on Machine Learning, 21-27 July 2024, Vienna, Austria
Note

Funding agencies: y the Wallenberg AI, Autonomous Systems and Software Program (WASP), fundedby the Knut and Alice Wallenberg (KAW) Foundation;  the Vinnova project, human centered autonomous regional airport, Dnr 2022-02678. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, provided by the KAW Foundation at the National Supercomputer Centre (NSC). 

Available from: 2024-08-30 Created: 2024-08-30 Last updated: 2024-09-13
Melnyk, P., Felsberg, M., Wadenbäck, M., Robinson, A. & Le, C. (2024). On Learning Deep O(n)-Equivariant Hyperspheres. In: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix (Ed.), Proceedings of the 41st International Conference on Machine Learning: . Paper presented at 41st International Conference on Machine Learning, Vienna, Austria, 21-27 July 2024 (pp. 35324-35339). PMLR, 235
Open this publication in new window or tab >>On Learning Deep O(n)-Equivariant Hyperspheres
Show others...
2024 (English)In: Proceedings of the 41st International Conference on Machine Learning / [ed] Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix, PMLR , 2024, Vol. 235, p. 35324-35339Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In this paper, we utilize hyperspheres and regular n-simplexes and propose an approach to learning deep features equivariant under the transformations of nD reflections and rotations, encompassed by the powerful group of O(n). Namely, we propose O(n)-equivariant neurons with spherical decision surfaces that generalize to any dimension n, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in nD, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O(n)-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available on GitHub.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 235
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-206460 (URN)
Conference
41st International Conference on Machine Learning, Vienna, Austria, 21-27 July 2024
Available from: 2024-08-14 Created: 2024-08-14 Last updated: 2025-02-07
Edstedt, J., Sun, Q., Bökman, G., Wadenbäck, M. & Felsberg, M. (2024). RoMa: Robust Dense Feature Matching. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle, WA, USA, 16-22 June 2024. (pp. 19790-19800). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>RoMa: Robust Dense Feature Matching
Show others...
2024 (English)In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 19790-19800Conference paper, Published paper (Refereed)
Abstract [en]

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Al-though these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at github.com/Parskatt/RoMa.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
Conference on Computer Vision and Pattern Recognition (CVPR), ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-207702 (URN)10.1109/CVPR52733.2024.01871 (DOI)001342515503014 ()2-s2.0-85199525100 (Scopus ID)9798350353006 (ISBN)9798350353013 (ISBN)
Conference
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle, WA, USA, 16-22 June 2024.
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; strategic research environment ELLIIT - Swedish government; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2024-09-17 Created: 2024-09-17 Last updated: 2025-02-12
Melnyk, P., Robinson, A., Felsberg, M. & Wadenbäck, M. (2024). TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024: . Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16-22 June, 2024. (pp. 5620-5630). IEEE Computer Society
Open this publication in new window or tab >>TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
2024 (English)In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024, IEEE Computer Society, 2024, p. 5620-5630Conference paper, Published paper (Refereed)
Abstract [en]

In many practical applications, 3D point cloud analysis requires rotation invariance. In this paper, we present a learnable descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we propose an embedding of the 3D spherical neurons into 4D vector neurons, which leverages end-to-end training of the model. In our approach, we perform TetraTransform--an equivariant embedding of the 3D input into 4D, constructed from the steerable neurons--and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally, TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space

Place, publisher, year, edition, pages
IEEE Computer Society, 2024
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-207318 (URN)10.1109/CVPR52733.2024.00537 (DOI)001322555906003 ()9798350353006 (ISBN)9798350353013 (ISBN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16-22 June, 2024.
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP); Swedish Research Council [2022-04266]; strategic research environment EL-LIIT

Available from: 2024-09-04 Created: 2024-09-04 Last updated: 2025-02-07
Edstedt, J., Athanasiadis, I., Wadenbäck, M. & Felsberg, M. (2023). DKM: Dense Kernelized Feature Matching for Geometry Estimation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17-24 June 2023 (pp. 17765-17775). IEEE Communications Society
Open this publication in new window or tab >>DKM: Dense Kernelized Feature Matching for Geometry Estimation
2023 (English)In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Communications Society, 2023, p. 17765-17775Conference paper, Published paper (Refereed)
Abstract [en]

Feature matching is a challenging computer vision task that involves finding correspondences between two images of a 3D scene. In this paper we consider the dense approach instead of the more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, dense methods have previously shown inferior performance to their sparse and semi-sparse counterparts for estimation of two-view geometry. This changes with our novel dense method, which outperforms both dense and sparse methods on geometry estimation. The novelty is threefold: First, we propose a kernel regression global matcher. Secondly, we propose warp refinement through stacked feature maps and depthwise convolution kernels. Thirdly, we propose learning dense confidence through consistent depth and a balanced sampling approach for dense confidence maps. Through extensive experiments we confirm that our proposed dense method, Dense Kernelized Feature Matching, sets a new state-of-the-art on multiple geometry estimation benchmarks. In particular, we achieve an improvement on MegaDepth-1500 of +4.9 and +8.9 AUC@5° compared to the best previous sparse method and dense method respectively. Our code is provided at the following repository: https://github.com/Parskatt/DKM.

Place, publisher, year, edition, pages
IEEE Communications Society, 2023
Series
Proceedings:IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-197717 (URN)10.1109/cvpr52729.2023.01704 (DOI)001062531302008 ()9798350301298 (ISBN)9798350301304 (ISBN)
Conference
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17-24 June 2023
Note

This work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by Knut and Alice Wallenberg Foundation; andby the strategic research environment ELLIIT funded by the Swedish government. The computational resources were provided by the National Academic Infrastructure forSupercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, provided bythe Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2023-09-11 Created: 2023-09-11 Last updated: 2025-02-07Bibliographically approved
Holmquist, K., Klasén, L. & Felsberg, M. (2023). Evidential Deep Learning for Class-Incremental Semantic Segmentation. In: Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen (Ed.), Image Analysis. SCIA 2023.: . Paper presented at SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023 (pp. 32-48). Springer
Open this publication in new window or tab >>Evidential Deep Learning for Class-Incremental Semantic Segmentation
2023 (English)In: Image Analysis. SCIA 2023. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, p. 32-48Conference paper, Published paper (Refereed)
Abstract [en]

Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. Although the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges, since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exists a background-shift.

In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC and ADE20k benchmarks show that our method is superior to the state of the art, especially when repeatedly learning new classes with increasing number of increments.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13886
Keywords
Class-incremental learning, Continual-learning, Semantic Segmentation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-193265 (URN)10.1007/978-3-031-31438-4_3 (DOI)9783031314377 (ISBN)9783031314384 (ISBN)
Conference
SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023
Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6096-3648

Search in DiVA

Show all publications