liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Danelljan, Martin
Publications (10 of 17) Show all publications
Danelljan, M., Bhat, G., Gladh, S., Khan, F. S. & Felsberg, M. (2018). Deep motion and appearance cues for visual tracking. Pattern Recognition Letters
Open this publication in new window or tab >>Deep motion and appearance cues for visual tracking
Show others...
2018 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344Article in journal (Refereed) Published
Abstract [en]

Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
Visual tracking, Deep learning, Optical flow, Discriminative correlation filters
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-148015 (URN)10.1016/j.patrec.2018.03.009 (DOI)2-s2.0-85044328745 (Scopus ID)
Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2018-05-31Bibliographically approved
Järemo Lawin, F., Danelljan, M., Khan, F. S., Forssén, P.-E. & Felsberg, M. (2018). Density Adaptive Point Set Registration. In: : . Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018.
Open this publication in new window or tab >>Density Adaptive Point Set Registration
Show others...
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-149774 (URN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, United States, 18-22 June, 2018
Available from: 2018-07-18 Created: 2018-07-18 Last updated: 2018-10-10Bibliographically approved
Danelljan, M. (2018). Learning Convolution Operators for Visual Tracking. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Learning Convolution Operators for Visual Tracking
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Visual tracking is one of the fundamental problems in computer vision. Its numerous applications include robotics, autonomous driving, augmented reality and 3D reconstruction. In essence, visual tracking can be described as the problem of estimating the trajectory of a target in a sequence of images. The target can be any image region or object of interest. While humans excel at this task, requiring little effort to perform accurate and robust visual tracking, it has proven difficult to automate. It has therefore remained one of the most active research topics in computer vision.

In its most general form, no prior knowledge about the object of interest or environment is given, except for the initial target location. This general form of tracking is known as generic visual tracking. The unconstrained nature of this problem makes it particularly difficult, yet applicable to a wider range of scenarios. As no prior knowledge is given, the tracker must learn an appearance model of the target on-the-fly. Cast as a machine learning problem, it imposes several major challenges which are addressed in this thesis.

The main purpose of this thesis is the study and advancement of the, so called, Discriminative Correlation Filter (DCF) framework, as it has shown to be particularly suitable for the tracking application. By utilizing properties of the Fourier transform, a correlation filter is discriminatively learned by efficiently minimizing a least-squares objective. The resulting filter is then applied to a new image in order to estimate the target location.

This thesis contributes to the advancement of the DCF methodology in several aspects. The main contribution regards the learning of the appearance model: First, the problem of updating the appearance model with new training samples is covered. Efficient update rules and numerical solvers are investigated for this task. Second, the periodic assumption induced by the circular convolution in DCF is countered by proposing a spatial regularization component. Third, an adaptive model of the training set is proposed to alleviate the impact of corrupted or mislabeled training samples. Fourth, a continuous-space formulation of the DCF is introduced, enabling the fusion of multiresolution features and sub-pixel accurate predictions. Finally, the problems of computational complexity and overfitting are addressed by investigating dimensionality reduction techniques.

As a second contribution, different feature representations for tracking are investigated. A particular focus is put on the analysis of color features, which had been largely overlooked in prior tracking research. This thesis also studies the use of deep features in DCF-based tracking. While many vision problems have greatly benefited from the advent of deep learning, it has proven difficult to harvest the power of such representations for tracking. In this thesis it is shown that both shallow and deep layers contribute positively. Furthermore, the problem of fusing their complementary properties is investigated.

The final major contribution of this thesis regards the prediction of the target scale. In many applications, it is essential to track the scale, or size, of the target since it is strongly related to the relative distance. A thorough analysis of how to integrate scale estimation into the DCF framework is performed. A one-dimensional scale filter is proposed, enabling efficient and accurate scale estimation.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2018. p. 71
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1926
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-147543 (URN)10.3384/diss.diva-147543 (DOI)9789176853320 (ISBN)
Public defence
2018-06-11, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2018-05-03 Created: 2018-04-25 Last updated: 2018-09-19Bibliographically approved
Johnander, J., Danelljan, M., Khan, F. S. & Felsberg, M. (2017). DCCO: Towards Deformable Continuous Convolution Operators for Visual Tracking. In: Michael Felsberg, Anders Heyden and Norbert Krüger (Ed.), Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I. Paper presented at 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I (pp. 55-67). Springer, 10424
Open this publication in new window or tab >>DCCO: Towards Deformable Continuous Convolution Operators for Visual Tracking
2017 (English)In: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, Vol. 10424, p. 55-67Conference paper, Published paper (Refereed)
Abstract [en]

Discriminative Correlation Filter (DCF) based methods have shown competitive performance on tracking benchmarks in recent years. Generally, DCF based trackers learn a rigid appearance model of the target. However, this reliance on a single rigid appearance model is insufficient in situations where the target undergoes non-rigid transformations. In this paper, we propose a unified formulation for learning a deformable convolution filter. In our framework, the deformable filter is represented as a linear combination of sub-filters. Both the sub-filter coefficients and their relative locations are inferred jointly in our formulation. Experiments are performed on three challenging tracking benchmarks: OTB-2015, TempleColor and VOT2016. Our approach improves the baseline method, leading to performance comparable to state-of-the-art.

Place, publisher, year, edition, pages
Springer, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10424
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Engineering
Identifiers
urn:nbn:se:liu:diva-145373 (URN)10.1007/978-3-319-64689-3_5 (DOI)000432085900005 ()9783319646886 (ISBN)9783319646893 (ISBN)
Conference
17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I
Note

Funding agencies: SSF (SymbiCloud); VR (EMC2) [2016-05543]; SNIC; WASP; Nvidia

Available from: 2018-02-26 Created: 2018-02-26 Last updated: 2018-10-16Bibliographically approved
Järemo-Lawin, F., Danelljan, M., Tosteberg, P., Bhat, G., Khan, F. S. & Felsberg, M. (2017). Deep Projective 3D Semantic Segmentation. In: Michael Felsberg, Anders Heyden and Norbert Krüger (Ed.), Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I. Paper presented at 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I (pp. 95-107). Springer
Open this publication in new window or tab >>Deep Projective 3D Semantic Segmentation
Show others...
2017 (English)In: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, p. 95-107Conference paper, Published paper (Refereed)
Abstract [en]

Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets.

Place, publisher, year, edition, pages
Springer, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10424
Keywords
Point clouds, Semantic segmentation, Deep learning, Multi-stream deep networks
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Engineering
Identifiers
urn:nbn:se:liu:diva-145374 (URN)10.1007/978-3-319-64689-3_8 (DOI)000432085900008 ()2-s2.0-85028506569 (Scopus ID)9783319646886 (ISBN)9783319646893 (ISBN)
Conference
17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I
Note

Funding agencies: EU [644839]; Swedish Research Council [2014-6227]; Swedish Foundation for Strategic Research [RIT 15-0097]; VR starting grant [2016-05543]

Available from: 2018-02-26 Created: 2018-02-26 Last updated: 2018-10-10Bibliographically approved
Danelljan, M., Meneghetti, G., Khan, F. S. & Felsberg, M. (2016). Aligning the Dissimilar: A Probabilistic Feature-Based Point Set Registration Approach. In: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR) 2016: . Paper presented at 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4-8 Dec. 2016 (pp. 247-252). IEEE
Open this publication in new window or tab >>Aligning the Dissimilar: A Probabilistic Feature-Based Point Set Registration Approach
2016 (English)In: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR) 2016, IEEE, 2016, p. 247-252Conference paper, Published paper (Refereed)
Abstract [en]

3D-point set registration is an active area of research in computer vision. In recent years, probabilistic registration approaches have demonstrated superior performance for many challenging applications. Generally, these probabilistic approaches rely on the spatial distribution of the 3D-points, and only recently color information has been integrated into such a framework, significantly improving registration accuracy. Other than local color information, high-dimensional 3D shape features have been successfully employed in many applications such as action recognition and 3D object recognition. In this paper, we propose a probabilistic framework to integrate high-dimensional 3D shape features with color information for point set registration. The 3D shape features are distinctive and provide complementary information beneficial for robust registration. We validate our proposed framework by performing comprehensive experiments on the challenging Stanford Lounge dataset, acquired by a RGB-D sensor, and an outdoor dataset captured by a Lidar sensor. The results clearly demonstrate that our approach provides superior results both in terms of robustness and accuracy compared to state-of-the-art probabilistic methods.

Place, publisher, year, edition, pages
IEEE, 2016
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-137895 (URN)10.1109/ICPR.2016.7899641 (DOI)000406771300044 ()2-s2.0-85019098777 (Scopus ID)9781509048472 (ISBN)9781509048489 (ISBN)
Conference
23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4-8 Dec. 2016
Note

Funding agencies:Funding Agencies|SSF (VPS); VR (EMC2); Vinnova (iQMatic); EUs Horizon RI program grant [644839]; Wallenberg Autonomous Systems Program; NSC; Nvidia

Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2018-10-08Bibliographically approved
Häger, G., Bhat, G., Danelljan, M., Khan, F. S., Felsberg, M., Rudol, P. & Doherty, P. (2016). Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV. In: Proceedings of the 12th International Symposium on Advances in Visual Computing: . Paper presented at International Symposium on Advances in Visual Computing.
Open this publication in new window or tab >>Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV
Show others...
2016 (English)In: Proceedings of the 12th International Symposium on Advances in Visual Computing, 2016Conference paper, Published paper (Refereed)
Abstract [en]

Visual object tracking performance has improved significantly in recent years. Most trackers are based on either of two paradigms: online learning of an appearance model or the use of a pre-trained object detector. Methods based on online learning provide high accuracy, but are prone to model drift. The model drift occurs when the tracker fails to correctly estimate the tracked object’s position. Methods based on a detector on the other hand typically have good long-term robustness, but reduced accuracy compared to online methods.

Despite the complementarity of the aforementioned approaches, the problem of fusing them into a single framework is largely unexplored. In this paper, we propose a novel fusion between an online tracker and a pre-trained detector for tracking humans from a UAV. The system operates at real-time on a UAV platform. In addition we present a novel dataset for long-term tracking in a UAV setting, that includes scenarios that are typically not well represented in standard visual tracking datasets.

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-137897 (URN)10.1007/978-3-319-50835-1_50 (DOI)2-s2.0-85007039301 (Scopus ID)978-3-319-50834-4 (ISBN)978-3-319-50835-1 (ISBN)
Conference
International Symposium on Advances in Visual Computing
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2018-01-13Bibliographically approved
Gladh, S., Danelljan, M., Khan, F. S. & Felsberg, M. (2016). Deep motion features for visual tracking. In: Proceedings of the 23rd International Conference on, Pattern Recognition (ICPR), 2016: . Paper presented at The 23rd International Conference on, Pattern Recognition (ICPR), Cancun, Mexico, 4-8 Dec. 2016 (pp. 1243-1248). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Deep motion features for visual tracking
2016 (English)In: Proceedings of the 23rd International Conference on, Pattern Recognition (ICPR), 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1243-1248Conference paper, Published paper (Refereed)
Abstract [en]

Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-137896 (URN)10.1109/ICPR.2016.7899807 (DOI)000406771301042 ()2-s2.0-85019098606 (Scopus ID)9781509048472 (ISBN)9781509048489 (ISBN)
Conference
The 23rd International Conference on, Pattern Recognition (ICPR), Cancun, Mexico, 4-8 Dec. 2016
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2018-10-16Bibliographically approved
Felsberg, M., Kristan, M., Matas, J., Leonardis, A., Pflugfelder, R., Häger, G., . . . He, Z. (2016). The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results. In: Hua G., Jégou H. (Ed.), Computer Vision – ECCV 2016 Workshops. ECCV 2016.: . Paper presented at 14th European Conference on Computer Vision (ECCV) (pp. 824-849). SPRINGER INT PUBLISHING AG
Open this publication in new window or tab >>The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results
Show others...
2016 (English)In: Computer Vision – ECCV 2016 Workshops. ECCV 2016. / [ed] Hua G., Jégou H., SPRINGER INT PUBLISHING AG , 2016, p. 824-849Conference paper, Published paper (Refereed)
Abstract [en]

The Thermal Infrared Visual Object Tracking challenge 2016, VOT-TIR2016, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2016 is the second benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2016 challenge is similar to the 2015 challenge, the main difference is the introduction of new, more difficult sequences into the dataset. Furthermore, VOT-TIR2016 evaluation adopted the improvements regarding overlap calculation in VOT2016. Compared to VOT-TIR2015, a significant general improvement of results has been observed, which partly compensate for the more difficult sequences. The dataset, the evaluation kit, as well as the results are publicly available at the challenge website.

Place, publisher, year, edition, pages
SPRINGER INT PUBLISHING AG, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 9914
Keywords
Performance evaluation; Object tracking; Thermal IR; VOT
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-133773 (URN)10.1007/978-3-319-48881-3_55 (DOI)000389501700055 ()978-3-319-48881-3 (ISBN)978-3-319-48880-6 (ISBN)
Conference
14th European Conference on Computer Vision (ECCV)
Available from: 2017-01-11 Created: 2017-01-09 Last updated: 2018-10-15
Danelljan, M., Khan, F. S., Felsberg, M., Granström, K., Heintz, F., Rudol, P., . . . Doherty, P. (2015). A Low-Level Active Vision Framework for Collaborative Unmanned Aircraft Systems. In: Lourdes Agapito, Michael M. Bronstein and Carsten Rother (Ed.), Lourdes Agapito, Michael M. Bronstein and Carsten Rother (Ed.), COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I: . Paper presented at 13th European Conference on Computer Vision (ECCV) Switzerland, September 6-7 and 12 (pp. 223-237). Springer Publishing Company, 8925
Open this publication in new window or tab >>A Low-Level Active Vision Framework for Collaborative Unmanned Aircraft Systems
Show others...
2015 (English)In: COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I / [ed] Lourdes Agapito, Michael M. Bronstein and Carsten Rother, Springer Publishing Company, 2015, Vol. 8925, p. 223-237Conference paper, Published paper (Refereed)
Abstract [en]

Micro unmanned aerial vehicles are becoming increasingly interesting for aiding and collaborating with human agents in myriads of applications, but in particular they are useful for monitoring inaccessible or dangerous areas. In order to interact with and monitor humans, these systems need robust and real-time computer vision subsystems that allow to detect and follow persons.

In this work, we propose a low-level active vision framework to accomplish these challenging tasks. Based on the LinkQuad platform, we present a system study that implements the detection and tracking of people under fully autonomous flight conditions, keeping the vehicle within a certain distance of a person. The framework integrates state-of-the-art methods from visual detection and tracking, Bayesian filtering, and AI-based control. The results from our experiments clearly suggest that the proposed framework performs real-time detection and tracking of persons in complex scenarios

Place, publisher, year, edition, pages
Springer Publishing Company, 2015
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 8925
Keywords
Visual tracking; Visual surveillance; Micro UAV; Active vision
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-115847 (URN)10.1007/978-3-319-16178-5_15 (DOI)000362493800015 ()978-3-319-16177-8 (ISBN)978-3-319-16178-5 (ISBN)
Conference
13th European Conference on Computer Vision (ECCV) Switzerland, September 6-7 and 12
Available from: 2015-03-20 Created: 2015-03-20 Last updated: 2018-02-07Bibliographically approved
Organisations

Search in DiVA

Show all publications