liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Järemo-Lawin, Felix
Publications (7 of 7) Show all publications
Robinson, A., Järemo-Lawin, F., Danelljan, M., Khan, F. S. & Felsberg, M. (2020). Learning Fast and Robust Target Models for Video Object Segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13-19 June 2020 (pp. 7404-7413). IEEE, Article ID 9156406.
Open this publication in new window or tab >>Learning Fast and Robust Target Models for Video Object Segmentation
Show others...
2020 (English)In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 7404-7413, article id 9156406Conference paper, Published paper (Refereed)
Abstract [en]

Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.

Place, publisher, year, edition, pages
IEEE, 2020
Series
Computer Society Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Image segmentation;Robustness;Object segmentation;Adaptation models;Data models;Training;Target tracking
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-168133 (URN)10.1109/CVPR42600.2020.00743 (DOI)001309199900006 ()2-s2.0-85094324768 (Scopus ID)978-1-7281-7168-5 (ISBN)
Conference
Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13-19 June 2020
Available from: 2020-08-17 Created: 2020-08-17 Last updated: 2025-02-07
Goutam, B., Järemo-Lawin, F., Danelljan, M., Robinson, A., Felsberg, M., Van Gool, L. & Timofte, R. (2020). Learning What to Learn for Video Object Segmentation. In: Vedaldi A., Bischof H., Brox T., Frahm JM (Ed.), Computer Vision: ECCV 2020 Workshop. Paper presented at European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020 (pp. 777-794).
Open this publication in new window or tab >>Learning What to Learn for Video Object Segmentation
Show others...
2020 (English)In: Computer Vision: ECCV 2020 Workshop / [ed] Vedaldi A., Bischof H., Brox T., Frahm JM, 2020, p. 777-794Conference paper, Published paper (Refereed)
Abstract [en]

Video object segmentation (VOS) is a highly challengingproblem, since the target object is only defined by a first-frame refer-ence mask during inference. The problem of how to capture and utilizethis limited information to accurately segment the target remains a fun-damental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shotlearner. Our learner is designed to predict a powerful parametric modelof the target by minimizing a segmentation error in the first frame. Wefurther go beyond the standard few-shot learning paradigm by learningwhat our target model should learn in order to maximize segmentationaccuracy. We perform extensive experiments on standard benchmarks.Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding toa 2.6% relative improvement over the previous best result. The code andmodels are available at https://github.com/visionml/pytracking.

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12347
National Category
Engineering and Technology Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-168716 (URN)10.1007/978-3-030-58536-5_46 (DOI)001500572000046 ()2-s2.0-85097234947 (Scopus ID)978-3-030-58535-8 (ISBN)978-3-030-58536-5 (ISBN)
Conference
European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020
Note

Funding agencies:vThis work was partly supported by the ETH Zürich Fund (OK), a Huawei Technologies Oy (Finland) project, an Amazon AWS grant, Nvidia, ELLIIT Excellence Center, the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the SSF project Symbicloud.

Available from: 2020-08-28 Created: 2020-08-28 Last updated: 2026-02-20
Järemo-Lawin, F. & Forssén, P.-E. (2020). Registration Loss Learning for Deep Probabilistic Point Set Registration. In: 2020 International Conference on 3D Vision (3DV): . Paper presented at International Virtual Conference on 3D Vision, November 25-28, 2020 (pp. 563-572). IEEE
Open this publication in new window or tab >>Registration Loss Learning for Deep Probabilistic Point Set Registration
2020 (English)In: 2020 International Conference on 3D Vision (3DV), IEEE, 2020, p. 563-572Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic methods for point set registration have interesting theoretical properties, such as linear complexity in the number of used points, and they easily generalize to joint registration of multiple point sets. In this work, we improve their recognition performance to match state of the art. This is done by incorporating learned features, by adding a von Mises-Fisher feature model in each mixture component, and by using learned attention weights. We learn these jointly using a registration loss learning strategy (RLL) that directly uses the registration error as a loss, by back-propagating through the registration iterations. This is possible as the probabilistic registration is fully differentiable, and the result is a learning framework that is truly end-to-end. We perform extensive experiments on the 3DMatch and Kitti datasets. The experiments demonstrate that our approach benefits significantly from the integration of the learned features and our learning strategy, outperforming the state-of-the-art on Kitti. Code is available at https://github.com/felja633/RLLReg.

Place, publisher, year, edition, pages
IEEE, 2020
Series
International Conference on 3D Vision, ISSN 2378-3826
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-173539 (URN)10.1109/3DV50981.2020.00066 (DOI)000653085200057 ()978-1-7281-8128-8 (ISBN)978-1-7281-8129-5 (ISBN)
Conference
International Virtual Conference on 3D Vision, November 25-28, 2020
Note

Funding Agencies|ELLIIT Excellence Center; Vinnova through the Visual Sweden networkVinnova [2019-02261]

Available from: 2021-02-22 Created: 2021-02-22 Last updated: 2022-10-06Bibliographically approved
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., . . . Ma, Z. (2020). The Eighth Visual Object Tracking VOT2020 Challenge Results. In: Adrien Bartoli; Andrea Fusiello (Ed.), Computer Vision: ECCV 2020 Workshops, Glasgow, UK, August 23–28, 2020. Paper presented at ECCV 20 European Conference on Computer Vision (pp. 547-601). , 12539
Open this publication in new window or tab >>The Eighth Visual Object Tracking VOT2020 Challenge Results
Show others...
2020 (English)In: Computer Vision: ECCV 2020 Workshops, Glasgow, UK, August 23–28, 2020 / [ed] Adrien Bartoli; Andrea Fusiello, 2020, Vol. 12539, p. 547-601Conference paper, Published paper (Refereed)
Abstract [en]

The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net ). 

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12539
Keywords
Depth; Long-term trackers; Performance evaluation protocol; RGB; RGBD; RGBT; Short-term trackers; State-of-the-art benchmark; Thermal imagery; Visual object tracking
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-179796 (URN)10.1007/978-3-030-68238-5_39 (DOI)2-s2.0-85101374294 (Scopus ID)9783030682378 (ISBN)
Conference
ECCV 20 European Conference on Computer Vision
Available from: 2021-10-02 Created: 2021-10-02 Last updated: 2025-02-07
Robinson, A., Järemo-Lawin, F., Danelljan, M. & Felsberg, M. (2019). Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation. In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation. Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Open this publication in new window or tab >>Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation
2019 (English)In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data

Keywords
video object segmentation, computer vision, machine learning
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-163334 (URN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Available from: 2020-02-01 Created: 2020-02-01 Last updated: 2025-02-07
Järemo Lawin, F., Danelljan, M., Khan, F. S., Forssén, P.-E. & Felsberg, M. (2018). Density Adaptive Point Set Registration. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition: . Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018 (pp. 3829-3837). IEEE
Open this publication in new window or tab >>Density Adaptive Point Set Registration
Show others...
2018 (English)In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, p. 3829-3837Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.

Place, publisher, year, edition, pages
IEEE, 2018
Series
IEEE Conference on Computer Vision and Pattern Recognition
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-149774 (URN)10.1109/CVPR.2018.00403 (DOI)000457843603101 ()978-1-5386-6420-9 (ISBN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018
Note

Funding Agencies|EUs Horizon 2020 Programme [644839]; CENIIT grant [18.14]; VR grant: EMC2 [2014-6227]; VR grant [2016-05543]; VR grant: LCMM [2014-5928]

Available from: 2018-07-18 Created: 2018-07-18 Last updated: 2023-04-03Bibliographically approved
Järemo-Lawin, F., Danelljan, M., Tosteberg, P., Bhat, G., Khan, F. S. & Felsberg, M. (2017). Deep Projective 3D Semantic Segmentation. In: Michael Felsberg, Anders Heyden and Norbert Krüger (Ed.), Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I. Paper presented at 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I (pp. 95-107). Springer
Open this publication in new window or tab >>Deep Projective 3D Semantic Segmentation
Show others...
2017 (English)In: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, p. 95-107Conference paper, Published paper (Refereed)
Abstract [en]

Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets.

Place, publisher, year, edition, pages
Springer, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10424
Keywords
Point clouds, Semantic segmentation, Deep learning, Multi-stream deep networks
National Category
Computer graphics and computer vision Computer Engineering
Identifiers
urn:nbn:se:liu:diva-145374 (URN)10.1007/978-3-319-64689-3_8 (DOI)000432085900008 ()2-s2.0-85028506569 (Scopus ID)9783319646886 (ISBN)9783319646893 (ISBN)
Conference
17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I
Note

Funding agencies: EU [644839]; Swedish Research Council [2014-6227]; Swedish Foundation for Strategic Research [RIT 15-0097]; VR starting grant [2016-05543]

Available from: 2018-02-26 Created: 2018-02-26 Last updated: 2025-02-01Bibliographically approved
Organisations

Search in DiVA

Show all publications