liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
ECO: Efficient Convolution Operators for Tracking
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-6096-3648
2017 (Engelska)Ingår i: Proceedings 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 6931-6939Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and Temple-Color. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method [12] in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2017. s. 6931-6939
Serie
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919 ; 2017
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:liu:diva-144284DOI: 10.1109/CVPR.2017.733ISI: 000418371407004ISBN: 9781538604571 (digital)ISBN: 9781538604588 (tryckt)OAI: oai:DiVA.org:liu-144284DiVA, id: diva2:1173604
Konferens
30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, Honolulu, HI, USA
Anmärkning

Funding Agencies|SSF (SymbiCloud); VR (EMC2) [2016-05543]; SNIC; WASP; Visual Sweden; Nvidia

Tillgänglig från: 2018-01-12 Skapad: 2018-01-12 Senast uppdaterad: 2019-06-26Bibliografiskt granskad
Ingår i avhandling
1. Learning Convolution Operators for Visual Tracking
Öppna denna publikation i ny flik eller fönster >>Learning Convolution Operators for Visual Tracking
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Visual tracking is one of the fundamental problems in computer vision. Its numerous applications include robotics, autonomous driving, augmented reality and 3D reconstruction. In essence, visual tracking can be described as the problem of estimating the trajectory of a target in a sequence of images. The target can be any image region or object of interest. While humans excel at this task, requiring little effort to perform accurate and robust visual tracking, it has proven difficult to automate. It has therefore remained one of the most active research topics in computer vision.

In its most general form, no prior knowledge about the object of interest or environment is given, except for the initial target location. This general form of tracking is known as generic visual tracking. The unconstrained nature of this problem makes it particularly difficult, yet applicable to a wider range of scenarios. As no prior knowledge is given, the tracker must learn an appearance model of the target on-the-fly. Cast as a machine learning problem, it imposes several major challenges which are addressed in this thesis.

The main purpose of this thesis is the study and advancement of the, so called, Discriminative Correlation Filter (DCF) framework, as it has shown to be particularly suitable for the tracking application. By utilizing properties of the Fourier transform, a correlation filter is discriminatively learned by efficiently minimizing a least-squares objective. The resulting filter is then applied to a new image in order to estimate the target location.

This thesis contributes to the advancement of the DCF methodology in several aspects. The main contribution regards the learning of the appearance model: First, the problem of updating the appearance model with new training samples is covered. Efficient update rules and numerical solvers are investigated for this task. Second, the periodic assumption induced by the circular convolution in DCF is countered by proposing a spatial regularization component. Third, an adaptive model of the training set is proposed to alleviate the impact of corrupted or mislabeled training samples. Fourth, a continuous-space formulation of the DCF is introduced, enabling the fusion of multiresolution features and sub-pixel accurate predictions. Finally, the problems of computational complexity and overfitting are addressed by investigating dimensionality reduction techniques.

As a second contribution, different feature representations for tracking are investigated. A particular focus is put on the analysis of color features, which had been largely overlooked in prior tracking research. This thesis also studies the use of deep features in DCF-based tracking. While many vision problems have greatly benefited from the advent of deep learning, it has proven difficult to harvest the power of such representations for tracking. In this thesis it is shown that both shallow and deep layers contribute positively. Furthermore, the problem of fusing their complementary properties is investigated.

The final major contribution of this thesis regards the prediction of the target scale. In many applications, it is essential to track the scale, or size, of the target since it is strongly related to the relative distance. A thorough analysis of how to integrate scale estimation into the DCF framework is performed. A one-dimensional scale filter is proposed, enabling efficient and accurate scale estimation.

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2018. s. 71
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1926
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:liu:diva-147543 (URN)10.3384/diss.diva-147543 (DOI)9789176853320 (ISBN)
Disputation
2018-06-11, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2018-05-03 Skapad: 2018-04-25 Senast uppdaterad: 2019-09-26Bibliografiskt granskad

Open Access i DiVA

ECO: Efficient Convolution Operators for Tracking(2275 kB)23 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 2275 kBChecksumma SHA-512
74a424ab936a2c80694868f76215a939fd9e6c78d6168b40076dd233d043aee233383b70bf603ffbd576216c6d0719d7ff26e3608a94917d9ca41c1fcf965e64
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Personposter BETA

Danelljan, MartinBhat, GoutamKhan, Fahad ShahbazFelsberg, Michael

Sök vidare i DiVA

Av författaren/redaktören
Danelljan, MartinBhat, GoutamKhan, Fahad ShahbazFelsberg, Michael
Av organisationen
DatorseendeTekniska fakulteten
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 23 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 378 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf