liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Danelljan, Martin
Publications (10 of 28) Show all publications
Johnander, J., Danelljan, M., Brissman, E., Khan, F. S. & Felsberg, M. (2019). A generative appearance model for end-to-end video object segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at IEEE Conference on Computer Vision and Pattern Recognition. 2019, Long Beach, CA, USA, USA, 15-20 June 2019 (pp. 8945-8954). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A generative appearance model for end-to-end video object segmentation
Show others...
2019 (English)In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8945-8954Conference paper, Published paper (Refereed)
Abstract [en]

One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Series
Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Segmentation; Grouping and Shape; Motion and Tracking
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-161037 (URN)10.1109/CVPR.2019.00916 (DOI)9781728132938 (ISBN)9781728132945 (ISBN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition. 2019, Long Beach, CA, USA, USA, 15-20 June 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research Swedish Research Council
Available from: 2019-10-17 Created: 2019-10-17 Last updated: 2020-01-22Bibliographically approved
Danelljan, M., Bhat, G., Khan, F. S. & Felsberg, M. (2019). ATOM: Accurate tracking by overlap maximization. In: : . Paper presented at IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 16th - June 20th, 2019.
Open this publication in new window or tab >>ATOM: Accurate tracking by overlap maximization
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring highlevel knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating targetspecific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking.

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163194 (URN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 16th - June 20th, 2019
Available from: 2020-01-22 Created: 2020-01-22 Last updated: 2020-02-06Bibliographically approved
Danelljan, M., Bhat, G., Gladh, S., Khan, F. S. & Felsberg, M. (2019). Deep motion and appearance cues for visual tracking. Pattern Recognition Letters, 124, 74-81
Open this publication in new window or tab >>Deep motion and appearance cues for visual tracking
Show others...
2019 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 124, p. 74-81Article in journal (Refereed) Published
Abstract [en]

Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
Visual tracking, Deep learning, Optical flow, Discriminative correlation filters
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-148015 (URN)10.1016/j.patrec.2018.03.009 (DOI)000469427700008 ()2-s2.0-85044328745 (Scopus ID)
Note

Funding agencies: Swedish Foundation for Strategic Research; Swedish Research Council [2016-05543]; Wallenberg Autonomous Systems Program; Swedish National Infrastructure for Computing (SNIC); Nvidia

Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2019-06-24Bibliographically approved
Robinson, A., Järemo-Lawin, F., Danelljan, M. & Felsberg, M. (2019). Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation. In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation. Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Open this publication in new window or tab >>Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation
2019 (English)In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data

Keywords
video object segmentation, computer vision, machine learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-163334 (URN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Available from: 2020-02-01 Created: 2020-02-01 Last updated: 2020-02-01
Bhat, G., Danelljan, M., Khan, F. S. & Felsberg, M. (2018). Combining Local and Global Models for Robust Re-detection. In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018: . Paper presented at 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27-30 November, Auckland, New Zealand (pp. 25-30). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Combining Local and Global Models for Robust Re-detection
2018 (English)In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 25-30Conference paper, Published paper (Refereed)
Abstract [en]

Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual tracking. However, these methods still struggle in occlusion and out-of-view scenarios due to the absence of a re-detection component. While such a component requires global knowledge of the scene to ensure robust re-detection of the target, the standard DCF is only trained on the local target neighborhood. In this paper, we augment the state-of-the-art DCF tracking framework with a re-detection component based on a global appearance model. First, we introduce a tracking confidence measure to detect target loss. Next, we propose a hard negative mining strategy to extract background distractors samples, used for training the global model. Finally, we propose a robust re-detection strategy that combines the global and local appearance model predictions. We perform comprehensive experiments on the challenging UAV123 and LTB35 datasets. Our approach shows consistent improvements over the baseline tracker, setting a new state-of-the-art on both datasets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
National Category
Computer Vision and Robotics (Autonomous Systems) Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-158403 (URN)10.1109/AVSS.2018.8639159 (DOI)000468081400005 ()9781538692943 (ISBN)9781538692936 (ISBN)9781538692950 (ISBN)
Conference
15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27-30 November, Auckland, New Zealand
Note

Funding Agencies|SSF (SymbiCloud); VR (EMC2) [2016-05543]; CENIIT grant [18.14]; SNIC; WASP

Available from: 2019-06-28 Created: 2019-06-28 Last updated: 2019-10-30Bibliographically approved
Järemo Lawin, F., Danelljan, M., Khan, F. S., Forssén, P.-E. & Felsberg, M. (2018). Density Adaptive Point Set Registration. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition: . Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018 (pp. 3829-3837). IEEE
Open this publication in new window or tab >>Density Adaptive Point Set Registration
Show others...
2018 (English)In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, p. 3829-3837Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.

Place, publisher, year, edition, pages
IEEE, 2018
Series
IEEE Conference on Computer Vision and Pattern Recognition
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-149774 (URN)10.1109/CVPR.2018.00403 (DOI)000457843603101 ()978-1-5386-6420-9 (ISBN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018
Note

Funding Agencies|EUs Horizon 2020 Programme [644839]; CENIIT grant [18.14]; VR grant: EMC2 [2014-6227]; VR grant [2016-05543]; VR grant: LCMM [2014-5928]

Available from: 2018-07-18 Created: 2018-07-18 Last updated: 2020-02-03Bibliographically approved
Danelljan, M. (2018). Learning Convolution Operators for Visual Tracking. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Learning Convolution Operators for Visual Tracking
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Visual tracking is one of the fundamental problems in computer vision. Its numerous applications include robotics, autonomous driving, augmented reality and 3D reconstruction. In essence, visual tracking can be described as the problem of estimating the trajectory of a target in a sequence of images. The target can be any image region or object of interest. While humans excel at this task, requiring little effort to perform accurate and robust visual tracking, it has proven difficult to automate. It has therefore remained one of the most active research topics in computer vision.

In its most general form, no prior knowledge about the object of interest or environment is given, except for the initial target location. This general form of tracking is known as generic visual tracking. The unconstrained nature of this problem makes it particularly difficult, yet applicable to a wider range of scenarios. As no prior knowledge is given, the tracker must learn an appearance model of the target on-the-fly. Cast as a machine learning problem, it imposes several major challenges which are addressed in this thesis.

The main purpose of this thesis is the study and advancement of the, so called, Discriminative Correlation Filter (DCF) framework, as it has shown to be particularly suitable for the tracking application. By utilizing properties of the Fourier transform, a correlation filter is discriminatively learned by efficiently minimizing a least-squares objective. The resulting filter is then applied to a new image in order to estimate the target location.

This thesis contributes to the advancement of the DCF methodology in several aspects. The main contribution regards the learning of the appearance model: First, the problem of updating the appearance model with new training samples is covered. Efficient update rules and numerical solvers are investigated for this task. Second, the periodic assumption induced by the circular convolution in DCF is countered by proposing a spatial regularization component. Third, an adaptive model of the training set is proposed to alleviate the impact of corrupted or mislabeled training samples. Fourth, a continuous-space formulation of the DCF is introduced, enabling the fusion of multiresolution features and sub-pixel accurate predictions. Finally, the problems of computational complexity and overfitting are addressed by investigating dimensionality reduction techniques.

As a second contribution, different feature representations for tracking are investigated. A particular focus is put on the analysis of color features, which had been largely overlooked in prior tracking research. This thesis also studies the use of deep features in DCF-based tracking. While many vision problems have greatly benefited from the advent of deep learning, it has proven difficult to harvest the power of such representations for tracking. In this thesis it is shown that both shallow and deep layers contribute positively. Furthermore, the problem of fusing their complementary properties is investigated.

The final major contribution of this thesis regards the prediction of the target scale. In many applications, it is essential to track the scale, or size, of the target since it is strongly related to the relative distance. A thorough analysis of how to integrate scale estimation into the DCF framework is performed. A one-dimensional scale filter is proposed, enabling efficient and accurate scale estimation.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2018. p. 71
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1926
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-147543 (URN)10.3384/diss.diva-147543 (DOI)9789176853320 (ISBN)
Public defence
2018-06-11, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2018-05-03 Created: 2018-04-25 Last updated: 2019-09-26Bibliographically approved
Johnander, J., Bhat, G., Danelljan, M., Khan, F. S. & Felsberg, M. (2018). On the Optimization of Advanced DCF-Trackers. In: Laura Leal-TaixéStefan Roth (Ed.), Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018, Proceedings, Part I. Paper presented at Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8-14 September, 2018 (pp. 54-69). Cham: Springer Publishing Company
Open this publication in new window or tab >>On the Optimization of Advanced DCF-Trackers
Show others...
2018 (English)In: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018, Proceedings, Part I / [ed] Laura Leal-TaixéStefan Roth, Cham: Springer Publishing Company, 2018, p. 54-69Conference paper, Published paper (Refereed)
Abstract [en]

Trackers based on discriminative correlation filters (DCF) have recently seen widespread success and in this work we dive into their numerical core. DCF-based trackers interleave learning of the target detector and target state inference based on this detector. Whereas the original formulation includes a closed-form solution for the filter learning, recently introduced improvements to the framework no longer have known closed-form solutions. Instead a large-scale linear least squares problem must be solved each time the detector is updated. We analyze the procedure used to optimize the detector and let the popular scheme introduced with ECO serve as a baseline. The ECO implementation is revisited in detail and several mechanisms are provided with alternatives. With comprehensive experiments we show which configurations are superior in terms of tracking capabilities and optimization performance.

Place, publisher, year, edition, pages
Cham: Springer Publishing Company, 2018
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11129
National Category
Engineering and Technology Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-161036 (URN)10.1007/978-3-030-11009-3_2 (DOI)9783030110086 (ISBN)9783030110093 (ISBN)
Conference
Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8-14 September, 2018
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2019-10-17 Created: 2019-10-17 Last updated: 2019-10-30Bibliographically approved
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. C., . . . He, Z. (2018). The Sixth Visual Object Tracking VOT2018 Challenge Results. In: Laura Leal-Taixé and Stefan Roth (Ed.), Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8–14, 2018 Proceedings, Part I. Paper presented at Computer Vision – ECCV 2018 Workshops, Munich, Germany, September 8–14, 2018 (pp. 3-53). Cham: Springer Publishing Company
Open this publication in new window or tab >>The Sixth Visual Object Tracking VOT2018 Challenge Results
Show others...
2018 (English)In: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8–14, 2018 Proceedings, Part I / [ed] Laura Leal-Taixé and Stefan Roth, Cham: Springer Publishing Company, 2018, p. 3-53Conference paper, Published paper (Refereed)
Abstract [en]

The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

Place, publisher, year, edition, pages
Cham: Springer Publishing Company, 2018
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11129
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-161343 (URN)10.1007/978-3-030-11009-3_1 (DOI)9783030110086 (ISBN)9783030110093 (ISBN)
Conference
Computer Vision – ECCV 2018 Workshops, Munich, Germany, September 8–14, 2018
Available from: 2019-10-30 Created: 2019-10-30 Last updated: 2020-01-22Bibliographically approved
Bhat, G., Johnander, J., Danelljan, M., Khan, F. S. & Felsberg, M. (2018). Unveiling the power of deep tracking. In: Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu and Yair Weiss (Ed.), Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II. Paper presented at 15th European Conference on Computer Vision (ECCV). Munich, Germany, 8-14 September, 2018 (pp. 493-509). Cham: Springer Publishing Company
Open this publication in new window or tab >>Unveiling the power of deep tracking
Show others...
2018 (English)In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II / [ed] Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu and Yair Weiss, Cham: Springer Publishing Company, 2018, p. 493-509Conference paper, Published paper (Refereed)
Abstract [en]

In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically study the characteristics of both deep and shallow features, and their relation to tracking accuracy and robustness. We identify the limited data and low spatial resolution as the main challenges, and propose strategies to counter these issues when integrating deep features for tracking. Furthermore, we propose a novel adaptive fusion approach that leverages the complementary properties of deep and shallow features to improve both robustness and accuracy. Extensive experiments are performed on four challenging datasets. On VOT2017, our approach significantly outperforms the top performing tracker from the challenge with a relative gain of >17% in EAO.

Place, publisher, year, edition, pages
Cham: Springer Publishing Company, 2018
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11206
National Category
Computer Vision and Robotics (Autonomous Systems) Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-161032 (URN)10.1007/978-3-030-01216-8_30 (DOI)9783030012151 (ISBN)9783030012168 (ISBN)
Conference
15th European Conference on Computer Vision (ECCV). Munich, Germany, 8-14 September, 2018
Available from: 2019-10-17 Created: 2019-10-17 Last updated: 2019-10-30Bibliographically approved
Organisations

Search in DiVA

Show all publications