liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 187) Show all publications
Johnander, J., Danelljan, M., Brissman, E., Khan, F. S. & Felsberg, M. (2019). A generative appearance model for end-to-end video object segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at IEEE Conference on Computer Vision and Pattern Recognition. 2019, Long Beach, CA, USA, USA, 15-20 June 2019 (pp. 8945-8954). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A generative appearance model for end-to-end video object segmentation
Show others...
2019 (English)In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8945-8954Conference paper, Published paper (Refereed)
Abstract [en]

One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Series
Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Segmentation; Grouping and Shape; Motion and Tracking
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-161037 (URN)10.1109/CVPR.2019.00916 (DOI)9781728132938 (ISBN)9781728132945 (ISBN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition. 2019, Long Beach, CA, USA, USA, 15-20 June 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research Swedish Research Council
Available from: 2019-10-17 Created: 2019-10-17 Last updated: 2020-01-22Bibliographically approved
Danelljan, M., Bhat, G., Khan, F. S. & Felsberg, M. (2019). ATOM: Accurate tracking by overlap maximization. In: : . Paper presented at IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 16th - June 20th, 2019.
Open this publication in new window or tab >>ATOM: Accurate tracking by overlap maximization
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring highlevel knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating targetspecific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking.

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163194 (URN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 16th - June 20th, 2019
Available from: 2020-01-22 Created: 2020-01-22 Last updated: 2020-02-06Bibliographically approved
Eldesokey, A., Felsberg, M. & Khan, F. S. (2019). Confidence Propagation through CNNs for Guided Sparse Depth Regression. IEEE Transactions on Pattern Analysis and Machine Intelligence
Open this publication in new window or tab >>Confidence Propagation through CNNs for Guided Sparse Depth Regression
2019 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0182-8828Article in journal (Refereed) Published
Abstract [en]

Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5% of the number of parameters compared to the state-of-the-art methods.

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-161086 (URN)10.1109/TPAMI.2019.2929170 (DOI)
Available from: 2019-10-21 Created: 2019-10-21 Last updated: 2019-10-25
Danelljan, M., Bhat, G., Gladh, S., Khan, F. S. & Felsberg, M. (2019). Deep motion and appearance cues for visual tracking. Pattern Recognition Letters, 124, 74-81
Open this publication in new window or tab >>Deep motion and appearance cues for visual tracking
Show others...
2019 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 124, p. 74-81Article in journal (Refereed) Published
Abstract [en]

Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
Visual tracking, Deep learning, Optical flow, Discriminative correlation filters
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-148015 (URN)10.1016/j.patrec.2018.03.009 (DOI)000469427700008 ()2-s2.0-85044328745 (Scopus ID)
Note

Funding agencies: Swedish Foundation for Strategic Research; Swedish Research Council [2016-05543]; Wallenberg Autonomous Systems Program; Swedish National Infrastructure for Computing (SNIC); Nvidia

Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2019-06-24Bibliographically approved
Robinson, A., Järemo-Lawin, F., Danelljan, M. & Felsberg, M. (2019). Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation. In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation. Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Open this publication in new window or tab >>Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation
2019 (English)In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data

Keywords
video object segmentation, computer vision, machine learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163334 (URN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Available from: 2020-02-01 Created: 2020-02-01 Last updated: 2020-02-01
Felsberg, M., Forssén, P.-E., Sintorn, I.-M. & Unger, J. (Eds.). (2019). Image Analysis. Paper presented at 21st Scandinavian Conference, SCIA 2019, Norrköping, Sweden, June 11-13, 2019. Springer
Open this publication in new window or tab >>Image Analysis
2019 (English)Conference proceedings (editor) (Refereed)
Abstract [en]

This volume constitutes the refereed proceedings of the 21st Scandinavian Conference on Image Analysis, SCIA 2019, held in Norrköping, Sweden, in June 2019.

The 40 revised papers presented were carefully reviewed and selected from 63 submissions. The contributions are structured in topical sections on Deep convolutional neural networks; Feature extraction and image analysis; Matching, tracking and geometry; and Medical and biomedical image analysis.

Place, publisher, year, edition, pages
Springer, 2019. p. 600
Series
Image Processing, Computer Vision, Pattern Recognition, and Graphics ; 11482
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163196 (URN)10.1007/978-3-030-20205-7 (DOI)9783030202040 (ISBN)9783030202057 (ISBN)
Conference
21st Scandinavian Conference, SCIA 2019, Norrköping, Sweden, June 11-13, 2019
Available from: 2020-01-22 Created: 2020-01-22 Last updated: 2020-02-06Bibliographically approved
Eldesokey, A., Felsberg, M. & Khan, F. S. (2019). Propagating Confidences through CNNs for Sparse Data Regression. In: British Machine Vision Conference 2018, BMVC 2018: . Paper presented at The 29th British Machine Vision Conference (BMVC), Northumbria University, Newcastle upon Tyne, England, UK, 3-6 September, 2018. BMVA Press
Open this publication in new window or tab >>Propagating Confidences through CNNs for Sparse Data Regression
2019 (English)In: British Machine Vision Conference 2018, BMVC 2018, BMVA Press , 2019Conference paper, Published paper (Refereed)
Abstract [en]

In most computer vision applications, convolutional neural networks (CNNs) operate on dense image data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open problem with numerous applications in autonomous driving, robotics, and surveillance. To tackle this challenging problem, we introduce an algebraically-constrained convolution layer for CNNs with sparse input and demonstrate its capabilities for the scene depth completion task. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. Furthermore, we propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. Comprehensive experiments are performed on the KITTI depth benchmark and the results clearly demonstrate that the proposed approach achieves superior performance while requiring three times fewer parameters than the state-of-the-art methods. Moreover, our approach produces a continuous pixel-wise confidence map enabling information fusion, state inference, and decision support.

Place, publisher, year, edition, pages
BMVA Press, 2019
National Category
Computer Vision and Robotics (Autonomous Systems) Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-149648 (URN)
Conference
The 29th British Machine Vision Conference (BMVC), Northumbria University, Newcastle upon Tyne, England, UK, 3-6 September, 2018
Available from: 2018-07-13 Created: 2018-07-13 Last updated: 2020-02-03Bibliographically approved
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.-K., . . . al., e. (2019). The seventh visual object tracking vot2019 challenge results. In: : . Paper presented at IEEE International Conference on Computer Vision Workshops.
Open this publication in new window or tab >>The seventh visual object tracking vot2019 challenge results
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on “real-time” shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website1 .

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163195 (URN)
Conference
IEEE International Conference on Computer Vision Workshops
Available from: 2020-01-22 Created: 2020-01-22 Last updated: 2020-02-06Bibliographically approved
Bhat, G., Danelljan, M., Khan, F. S. & Felsberg, M. (2018). Combining Local and Global Models for Robust Re-detection. In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018: . Paper presented at 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27-30 November, Auckland, New Zealand (pp. 25-30). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Combining Local and Global Models for Robust Re-detection
2018 (English)In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 25-30Conference paper, Published paper (Refereed)
Abstract [en]

Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual tracking. However, these methods still struggle in occlusion and out-of-view scenarios due to the absence of a re-detection component. While such a component requires global knowledge of the scene to ensure robust re-detection of the target, the standard DCF is only trained on the local target neighborhood. In this paper, we augment the state-of-the-art DCF tracking framework with a re-detection component based on a global appearance model. First, we introduce a tracking confidence measure to detect target loss. Next, we propose a hard negative mining strategy to extract background distractors samples, used for training the global model. Finally, we propose a robust re-detection strategy that combines the global and local appearance model predictions. We perform comprehensive experiments on the challenging UAV123 and LTB35 datasets. Our approach shows consistent improvements over the baseline tracker, setting a new state-of-the-art on both datasets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
National Category
Computer Vision and Robotics (Autonomous Systems) Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-158403 (URN)10.1109/AVSS.2018.8639159 (DOI)000468081400005 ()9781538692943 (ISBN)9781538692936 (ISBN)9781538692950 (ISBN)
Conference
15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27-30 November, Auckland, New Zealand
Note

Funding Agencies|SSF (SymbiCloud); VR (EMC2) [2016-05543]; CENIIT grant [18.14]; SNIC; WASP

Available from: 2019-06-28 Created: 2019-06-28 Last updated: 2019-10-30Bibliographically approved
Holmquist, K., Senel, D. & Felsberg, M. (2018). Computing a Collision-Free Path using the monogenic scale space. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): . Paper presented at IROS 2018, Madrid, Spain, October 1-5, 2018 (pp. 8097-8102). IEEE
Open this publication in new window or tab >>Computing a Collision-Free Path using the monogenic scale space
2018 (English)In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, p. 8097-8102Conference paper, Published paper (Refereed)
Abstract [en]

Mobile robots have been used for various purposes with different functionalities which require them to freely move in environments containing both static and dynamic obstacles to accomplish given tasks. One of the most relevant capabilities in terms of navigating a mobile robot in such an environment is to find a safe path to a goal position. This paper shows that there exists an accurate solution to the Laplace equation which allows finding a collision-free path and that it can be efficiently calculated for a rectangular bounded domain such as a map which is represented as an image. This is accomplished by the use of the monogenic scale space resulting in a vector field which describes the attracting and repelling forces from the obstacles and the goal. The method is shown to work in reasonably convex domains and by the use of tessellation of the environment map for non-convex environments.

Place, publisher, year, edition, pages
IEEE, 2018
Series
International Conference on Intelligent Robots and Systems (IROS), ISSN 2153-0858
National Category
Computer Vision and Robotics (Autonomous Systems) Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-152713 (URN)10.1109/IROS.2018.8593583 (DOI)978-1-5386-8094-0 (ISBN)978-1-5386-8095-7 (ISBN)978-1-5386-8093-3 (ISBN)
Conference
IROS 2018, Madrid, Spain, October 1-5, 2018
Note

Funding agencies:This work was founded by the European Union's Horizon 2020 Programme under grant agreement 644839 (CEN-TAURO).

Available from: 2018-11-16 Created: 2018-11-16 Last updated: 2019-10-31
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6096-3648

Search in DiVA

Show all publications