liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 210) Show all publications
Edstedt, J., Bökman, G., Wadenbäck, M. & Felsberg, M. (2024). DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching. In: 2024 International Conference on 3D Vision (3DV): . Paper presented at International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Davos, Switzerland, 18-21 March, 2024.. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching
2024 (English)In: 2024 International Conference on 3D Vision (3DV), Institute of Electrical and Electronics Engineers (IEEE), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected in each view of a scene. Crucially, the detected points need to be consistent between views, i.e., correspond to the same 3D point in the scene. One of the main challenges with keypoint detection is the formulation of the learning objective. Previous learning-based methods typically jointly learn descriptors with keypoints, and treat the keypoint detection as a binary classification task on mutual nearest neighbours. However, basing keypoint detection on descriptor nearest neighbours is a proxy task, which is not guaranteed to produce 3D-consistent keypoints. Furthermore, this ties the keypoints to a specific descriptor, complicating downstream usage. In this work, we instead learn keypoints directly from 3D consistency. To this end, we train the detector to detect tracks from large-scale SfM. As these points are often overly sparse, we derive a semi-supervised two-view detection objective to expand this set to a desired number of detections. To train a descriptor, we maximize the mutual nearest neighbour objective over the keypoints with a separate network. Results show that our approach, DeDoDe, achieves significant gains on multiple geometry benchmarks. Code is provided at http://github.com/Parskatt/DeDoDegithub.com/Parskatt/DeDoDe.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
2024 International Conference on 3D Vision (3DV), ISSN 2378-3826, E-ISSN 2475-7888
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-204892 (URN)10.1109/3dv62453.2024.00035 (DOI)9798350362459 (ISBN)9798350362466 (ISBN)
Conference
International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Davos, Switzerland, 18-21 March, 2024.
Available from: 2024-06-17 Created: 2024-06-17 Last updated: 2024-06-17
Jonnarth, A., Zhang, Y. & Felsberg, M. (2024). High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV): . Paper presented at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, jan 3-8, 2024 (pp. 999-1008). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
2024 (English)In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 999-1008Conference paper, Published paper (Refereed)
Abstract [en]

Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset further show that our proposed add-on is well-suited for large-scale settings. Our code implementation is available at https://github.com/arvijj/hfpl.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
weakly supervised, semantic segmentation, importance sampling, feature similarity, class activation maps
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-202446 (URN)10.1109/WACV57701.2024.00105 (DOI)
Conference
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, jan 3-8, 2024
Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2024-04-24Bibliographically approved
Edstedt, J., Athanasiadis, I., Wadenbäck, M. & Felsberg, M. (2023). DKM: Dense Kernelized Feature Matching for Geometry Estimation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17-24 June 2023 (pp. 17765-17775). IEEE Communications Society
Open this publication in new window or tab >>DKM: Dense Kernelized Feature Matching for Geometry Estimation
2023 (English)In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Communications Society, 2023, p. 17765-17775Conference paper, Published paper (Refereed)
Abstract [en]

Feature matching is a challenging computer vision task that involves finding correspondences between two images of a 3D scene. In this paper we consider the dense approach instead of the more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, dense methods have previously shown inferior performance to their sparse and semi-sparse counterparts for estimation of two-view geometry. This changes with our novel dense method, which outperforms both dense and sparse methods on geometry estimation. The novelty is threefold: First, we propose a kernel regression global matcher. Secondly, we propose warp refinement through stacked feature maps and depthwise convolution kernels. Thirdly, we propose learning dense confidence through consistent depth and a balanced sampling approach for dense confidence maps. Through extensive experiments we confirm that our proposed dense method, Dense Kernelized Feature Matching, sets a new state-of-the-art on multiple geometry estimation benchmarks. In particular, we achieve an improvement on MegaDepth-1500 of +4.9 and +8.9 AUC@5° compared to the best previous sparse method and dense method respectively. Our code is provided at the following repository: https://github.com/Parskatt/DKM.

Place, publisher, year, edition, pages
IEEE Communications Society, 2023
Series
Proceedings:IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-197717 (URN)10.1109/cvpr52729.2023.01704 (DOI)001062531302008 ()9798350301298 (ISBN)9798350301304 (ISBN)
Conference
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17-24 June 2023
Note

This work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by Knut and Alice Wallenberg Foundation; andby the strategic research environment ELLIIT funded by the Swedish government. The computational resources were provided by the National Academic Infrastructure forSupercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, provided bythe Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.

Available from: 2023-09-11 Created: 2023-09-11 Last updated: 2023-11-30Bibliographically approved
Holmquist, K., Klasén, L. & Felsberg, M. (2023). Evidential Deep Learning for Class-Incremental Semantic Segmentation. In: Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen (Ed.), Image Analysis. SCIA 2023.: . Paper presented at SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023 (pp. 32-48). Springer
Open this publication in new window or tab >>Evidential Deep Learning for Class-Incremental Semantic Segmentation
2023 (English)In: Image Analysis. SCIA 2023. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, p. 32-48Conference paper, Published paper (Refereed)
Abstract [en]

Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. Although the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges, since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exists a background-shift.

In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC and ADE20k benchmarks show that our method is superior to the state of the art, especially when repeatedly learning new classes with increasing number of increments.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13886
Keywords
Class-incremental learning, Continual-learning, Semantic Segmentation
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-193265 (URN)10.1007/978-3-031-31438-4_3 (DOI)9783031314377 (ISBN)9783031314384 (ISBN)
Conference
SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023
Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2024-04-27Bibliographically approved
Zhang, Y., Robinson, A., Magnusson, M. & Felsberg, M. (2023). Leveraging Optical Flow Features for Higher Generalization Power in Video Object Segmentation. In: 2023 IEEEInternational Conferenceon Image Processing: Proceedings. Paper presented at 2023 IEEE International Conference on Image Processing (ICIP), 8–11 October 2023 Kuala Lumpur, Malaysia (pp. 326-330). IEEE
Open this publication in new window or tab >>Leveraging Optical Flow Features for Higher Generalization Power in Video Object Segmentation
2023 (English)In: 2023 IEEEInternational Conferenceon Image Processing: Proceedings, IEEE , 2023, p. 326-330Conference paper, Published paper (Refereed)
Abstract [en]

We propose to leverage optical flow features for higher generalization power in semi-supervised video object segmentation. Optical flow is usually exploited as additional guidance information in many computer vision tasks. However, its relevance in video object segmentation was mainly in unsupervised settings or using the optical flow to warp or refine the previously predicted masks. Different from the latter, we propose to directly leverage the optical flow features in the target representation. We show that this enriched representation improves the encoder-decoder approach to the segmentation task. A model to extract the combined information from the optical flow and the image is proposed, which is then used as input to the target model and the decoder network. Unlike previous methods, e.g. in tracking where concatenation is used to integrate information from image data and optical flow, a simple yet effective attention mechanism is exploited in our work. Experiments on DAVIS 2017 and YouTube-VOS 2019 show that integrating the information extracted from optical flow into the original image branch results in a strong performance gain, especially in unseen classes which demonstrates its higher generalization power.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Optical flow features; Attention mechanism; Semi-supervised VOS; Generalization power
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-199057 (URN)10.1109/ICIP49359.2023.10222542 (DOI)001106821000063 ()9781728198354 (ISBN)9781728198361 (ISBN)
Conference
2023 IEEE International Conference on Image Processing (ICIP), 8–11 October 2023 Kuala Lumpur, Malaysia
Available from: 2023-11-08 Created: 2023-11-08 Last updated: 2024-03-12
Ljungbergh, W., Johnander, J., Petersson, C. & Felsberg, M. (2023). Raw or Cooked?: Object Detection on RAW Images. In: Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen (Ed.), Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I.. Paper presented at Scandinavian Conference on Image Analysis, Sirkka, Finland, April 18–21, 2023 (pp. 374-385). Springer, 13885
Open this publication in new window or tab >>Raw or Cooked?: Object Detection on RAW Images
2023 (English)In: Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, Vol. 13885, p. 374-385Conference paper, Published paper (Refereed)
Abstract [en]

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13885
Keywords
Computer Vision, Object detection, RAW images, Image Signal Processing
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-199000 (URN)10.1007/978-3-031-31435-3_25 (DOI)2-s2.0-85161382246 (Scopus ID)9783031314346 (ISBN)9783031314353 (ISBN)
Conference
Scandinavian Conference on Image Analysis, Sirkka, Finland, April 18–21, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2023-11-09Bibliographically approved
Brissman, E., Johnander, J., Danelljan, M. & Felsberg, M. (2023). Recurrent Graph Neural Networks for Video Instance Segmentation. International Journal of Computer Vision, 131, 471-495
Open this publication in new window or tab >>Recurrent Graph Neural Networks for Video Instance Segmentation
2023 (English)In: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 131, p. 471-495Article in journal (Refereed) Published
Abstract [en]

Video instance segmentation is one of the core problems in computer vision. Formulating a purely learning-based method, which models the generic track management required to solve the video instance segmentation task, is a highly challenging problem. In this work, we propose a novel learning framework where the entire video instance segmentation problem is modeled jointly. To this end, we design a graph neural network that in each frame jointly processes all detections and a memory of previously seen tracks. Past information is considered and processed via a recurrent connection. We demonstrate the effectiveness of the proposed approach in comprehensive experiments. Our approach operates online at over 25 FPS and obtains 16.3 AP on the challenging OVIS benchmark, setting a new state-of-the-art. We further conduct detailed ablative experiments that validate the different aspects of our approach. Code is available at https://github.com/emibr948/RGNNVIS-PlusPlus.

Place, publisher, year, edition, pages
Springer, 2023
Keywords
Detection; Tracking; Segmentation; Video
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-190196 (URN)10.1007/s11263-022-01703-8 (DOI)000885236800001 ()
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Excellence Center at Linkoping-Lund in Information Technology (ELLIT)

Available from: 2022-11-29 Created: 2022-11-29 Last updated: 2023-11-02Bibliographically approved
Melnyk, P., Felsberg, M. & Wadenbäck, M. (2022). Steerable 3D Spherical Neurons. In: Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato (Ed.), Proceedings of the 39th International Conference on Machine Learning: . Paper presented at International Conference on Machine Learning, Baltimore, Maryland, USA, 17-23 July 2022 (pp. 15330-15339). PMLR, 162
Open this publication in new window or tab >>Steerable 3D Spherical Neurons
2022 (English)In: Proceedings of the 39th International Conference on Machine Learning / [ed] Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato, PMLR , 2022, Vol. 162, p. 15330-15339Conference paper, Published paper (Refereed)
Abstract [en]

Emerging from low-level vision theory, steerable filters found their counterpart in prior work on steerable convolutional neural networks equivariant to rigid transformations. In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. Such spherical neurons are obtained by conformal embedding of Euclidean space and have recently been revisited in the context of learning representations of point sets. Focusing on 3D geometry, we exploit the isometry property of spherical neurons and derive a 3D steerability constraint. After training spherical neurons to classify point clouds in a canonical orientation, we use a tetrahedron basis to quadruplicate the neurons and construct rotation-equivariant spherical filter banks. We then apply the derived constraint to interpolate the filter bank outputs and, thus, obtain a rotation-invariant network. Finally, we use a synthetic point set and real-world 3D skeleton data to verify our theoretical findings. The code is available at https://github.com/pavlo-melnyk/steerable-3d-neurons.

Place, publisher, year, edition, pages
PMLR, 2022
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-187149 (URN)000900064905021 ()
Conference
International Conference on Machine Learning, Baltimore, Maryland, USA, 17-23 July 2022
Note

Funding: Wallenberg AI, Autonomous Systems and Software Program (WASP); Swedish Research Council [2018-04673]; strategic research environment ELLIIT

Available from: 2022-08-08 Created: 2022-08-08 Last updated: 2023-05-10
Felsberg, M. (2022). Visual tracking: Tracking in scenes containing multiple moving objects. In: E. R. Davies, Matthew A. Turk (Ed.), Advanced Methods and Deep Learning in Computer Vision: (pp. 305-336). London: Elsevier
Open this publication in new window or tab >>Visual tracking: Tracking in scenes containing multiple moving objects
2022 (English)In: Advanced Methods and Deep Learning in Computer Vision / [ed] E. R. Davies, Matthew A. Turk, London: Elsevier, 2022, p. 305-336Chapter in book (Refereed)
Place, publisher, year, edition, pages
London: Elsevier, 2022
National Category
Other Engineering and Technologies not elsewhere specified
Identifiers
urn:nbn:se:liu:diva-188549 (URN)9780128221099 (ISBN)
Available from: 2022-09-16 Created: 2022-09-16 Last updated: 2022-11-11Bibliographically approved
Holmquist, K., Klasén, L. & Felsberg, M. (2021). Class-Incremental Learning for Semantic Segmentation - A study. In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS): . Paper presented at 2021 Swedish Artificial Intelligence Society Workshop (SAIS), 14-15 June 2021, Sweden (pp. 25-28). IEEE
Open this publication in new window or tab >>Class-Incremental Learning for Semantic Segmentation - A study
2021 (English)In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS), IEEE , 2021, p. 25-28Conference paper, Published paper (Refereed)
Abstract [en]

One of the main challenges of applying deep learning for robotics is the difficulty of efficiently adapting to new tasks while still maintaining the same performance on previous tasks. The problem of incrementally learning new tasks commonly struggles with catastrophic forgetting in which the previous knowledge is lost.Class-incremental learning for semantic segmentation, addresses this problem in which we want to learn new semantic classes without having access to labeled data for previously learned classes. This is a problem in industry, where few pre-trained models and open datasets matches exactly the requisites. In these cases it is both expensive and labour intensive to collect an entirely new fully-labeled dataset. Instead, collecting a smaller dataset and only labeling the new classes is much more efficient in terms of data collection.In this paper we present the class-incremental learning problem for semantic segmentation, we discuss related work in terms of the more thoroughly studied classification task and experimentally validate the current state-of-the-art for semantic segmentation. This lays the foundation as we discuss some of the problems that still needs to be investigated and improved upon in order to reach a new state-of-the-art for class-incremental semantic segmentation.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Industries, Deep learning, Conferences, Semantics, Labeling, Task analysis, Artificial intelligence
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-189039 (URN)10.1109/sais53221.2021.9483955 (DOI)000855522600007 ()9781665442367 (ISBN)9781665442374 (ISBN)
Conference
2021 Swedish Artificial Intelligence Society Workshop (SAIS), 14-15 June 2021, Sweden
Funder
Vinnova
Note

Funding agencies: Vinnova [2020-02838]

Available from: 2022-10-08 Created: 2022-10-08 Last updated: 2023-03-01Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6096-3648

Search in DiVA

Show all publications