liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discriminative correlation filters in robot vision
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In less than ten years, deep neural networks have evolved into all-encompassing tools in multiple areas of science and engineering, due to their almost unreasonable effectiveness in modeling complex real-world relationships. In computer vision in particular, they have taken tasks such as object recognition, that were previously considered very difficult, and transformed them into everyday practical tools. However, neural networks have to be trained with supercomputers on massive datasets for hours or days, and this limits their ability adjust to changing conditions.

This thesis explores discriminative correlation filters, originally intended for tracking large objects in video, so-called visual object tracking. Unlike neural networks, these filters are small and can be quickly adapted to changes, with minimal data and computing power. At the same time, they can take advantage of the computing infrastructure developed for neural networks and operate within them.

The main contributions in this thesis demonstrate the versatility and adaptability of correlation filters for various problems, while complementing the capabilities of deep neural networks. In the first problem, it is shown that when adopted to track small regions and points, they outperform the widely used Lucas-Kanade method, both in terms of robustness and precision. 

In the second problem, the correlation filters take on a completely new task. Here, they are used to tell different places apart, in a 16 by 16 square kilometer region of ocean near land. Given only a horizon profile - the coast line silhouette of islands and islets as seen from an ocean vessel - it is demonstrated that discriminative correlation filters can effectively distinguish between locations.

In the third problem, it is shown how correlation filters can be applied to video object segmentation. This is the task of classifying individual pixels as belonging either to a target or the background, given a segmentation mask provided with the first video frame as the only guidance. It is also shown that discriminative correlation filters and deep neural networks complement each other; where the neural network processes the input video in a content-agnostic way, the filters adapt to specific target objects. The joint function is a real-time video object segmentation method.

Finally, the segmentation method is extended beyond binary target/background classification to additionally consider distracting objects. This addresses the fundamental difficulty of coping with objects of similar appearance.

Abstract [sv]

På mindre än tio år har djupa neurala nätverk utvecklats till heltäckande verktyg inom flera vetenskapliga och tekniska områden på grund av deras nästan orimliga effektivitet när det gäller att modellera komplexa verkliga förhållanden. I synnerhet inom datorseende har de tagit uppgifter som objektigenkänning, som tidigare ansågs vara mycket svåra, och förvandlat dem till praktiska vardagliga verktyg. Neurala nätverk måste dock tränas med superdatorer på massiva datamängder i timmar eller dagar, och detta begränsar deras förmåga att anpassa sig till förändrade förhållanden.

Denna avhandling undersöker diskriminerande korrelationsfilter, ursprungligen avsedda för spårning av stora objekt i video, så kallad visual object tracking. Till skillnad från neurala nätverk är dessa filter små och kan snabbt anpassas till förändringar, med lite data och minimal datorkraft. Samtidigt kan de dra nytta av den infrastruktur som utvecklats för neurala nätverk och arbeta inom den.

De viktigaste bidragen i denna avhandling visar mångsidigheten och anpassningsförmågan hos korrelationsfilter för olika problem, samtidigt som de kompletterar kapaciteten hos djupa neurala nätverk. I det första problemet visas det att när de appliceras på att spåra små regioner och punkter, överträffar de den ofta använda Lucas-Kanade-metoden, både när det gäller robusthet och precision.

I det andra problemet appliceras korrelationsfiltren på en helt ny uppgift. Här används de för att skilja mellan olika platser i en 16 x 16 kvadratkilometer stor havsregion nära land, givet endast en horisontprofil - kustlinjens silhuett av öar och holmar sett från ett fartyg.

I det tredje problemet visas hur korrelationsfilter kan användas för segmentering av objekt i video. Detta är uppgiften att klassificera enskilda pixlar som tillhörande antingen ett målobjekt eller bakgrunden, givet en segmenteringsmask försedd med den första bildrutan som enda vägledning. Det visas också att diskriminerande korrelationsfilter och djupa neurala nätverk kompletterar varandra; där det neurala nätverket behandlar videon på ett innehålls-agnostiskt sätt, anpassar filtren sig till specifika målobjekt. Den sammansatta funktionen är en realtidsmetod för segmentering.

Slutligen utvidgas segmenteringsmetoden bortom binär mål- / bakgrundsklassificering till att dessutom beakta distraherande objekt. Detta adresserar den grundläggande svårigheten att hantera objekt som liknar varandra.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2021. , p. 53
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2146
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-174939DOI: 10.3384/diss.diva-174939ISBN: 9789179296360 (print)OAI: oai:DiVA.org:liu-174939DiVA, id: diva2:1545394
Public defence
2021-06-14, Ada Lovelace, B-building, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2021-05-17 Created: 2021-04-19 Last updated: 2025-02-07Bibliographically approved
List of papers
1. Robust Three-View Triangulation Done Fast
Open this publication in new window or tab >>Robust Three-View Triangulation Done Fast
2014 (English)In: Proceedings: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014, IEEE , 2014, p. 152-157Conference paper, Published paper (Refereed)
Abstract [en]

Estimating the position of a 3-dimensional world point given its 2-dimensional projections in a set of images is a key component in numerous computer vision systems. There are several methods dealing with this problem, ranging from sub-optimal, linear least square triangulation in two views, to finding the world point that minimized the L2-reprojection error in three views. This leads to the statistically optimal estimate under the assumption of Gaussian noise. In this paper we present a solution to the optimal triangulation in three views. The standard approach for solving the three-view triangulation problem is to find a closed-form solution. In contrast to this, we propose a new method based on an iterative scheme. The method is rigorously tested on both synthetic and real image data with corresponding ground truth, on a midrange desktop PC and a Raspberry Pi, a low-end mobile platform. We are able to improve the precision achieved by the closed-form solvers and reach a speed-up of two orders of magnitude compared to the current state-of-the-art solver. In numbers, this amounts to around 300K triangulations per second on the PC and 30K triangulations per second on Raspberry Pi.

Place, publisher, year, edition, pages
IEEE, 2014
Series
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, ISSN 2160-7508
Keywords
Nonlinear optimization; Structure from motion; Three-view Triangulation; Cameras; Computer vision; Conferences; Noise; Polynomials; Robustness; Three-dimensional displays
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-111512 (URN)10.1109/CVPRW.2014.28 (DOI)000349552300023 ()978-1-4799-4309-8 (ISBN)
Conference
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 23-28, Columbus, OH, USA
Available from: 2014-10-20 Created: 2014-10-20 Last updated: 2025-02-01Bibliographically approved
2. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Open this publication in new window or tab >>Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
2016 (English)In: Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V / [ed] Bastian Leibe, Jiri Matas, Nicu Sebe and Max Welling, Cham: Springer, 2016, p. 472-488Conference paper, Published paper (Refereed)
Abstract [en]

Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate). Additionally, our approach is capable of sub-pixel localization, crucial for the task of accurate feature point tracking. We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments.

Place, publisher, year, edition, pages
Cham: Springer, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 9909
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-133550 (URN)10.1007/978-3-319-46454-1_29 (DOI)000389385400029 ()9783319464534 (ISBN)9783319464541 (ISBN)
Conference
14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, October 11-14, 2016
Available from: 2016-12-30 Created: 2016-12-29 Last updated: 2025-02-07Bibliographically approved
3. GPS-level accurate camera localization with HorizonNet
Open this publication in new window or tab >>GPS-level accurate camera localization with HorizonNet
2020 (English)In: Journal of Field Robotics, ISSN 1556-4959, E-ISSN 1556-4967, Vol. 37, no 6, p. 951-971Article in journal (Refereed) Published
Abstract [en]

This paper investigates the problem of position estimation of unmanned surface vessels (USVs) operating in coastal areas or in the archipelago. We propose a position estimation method where the horizon line is extracted in a 360 degrees panoramic image around the USV. We design a convolutional neural network (CNN) architecture to determine an approximate horizon line in the image and implicitly determine the camera orientation (the pitch and roll angles). The panoramic image is warped to compensate for the camera orientation and to generate an image from an approximately level camera. A second CNN architecture is designed to extract the pixelwise horizon line in the warped image. The extracted horizon line is correlated with digital elevation model data in the Fourier domain using a minimum output sum of squared error correlation filter. Finally, we determine the location of the maximum correlation score over the search area to estimate the position of the USV. Comprehensive experiments are performed in field trials conducted over 3 days in the archipelago. Our approach provides excellent results by achieving robust position estimates with global positioning system (GPS)-level accuracy in previously unvisited test areas.

Place, publisher, year, edition, pages
WILEY, 2020
Keywords
GPS-denied operation; localization; marine robotics
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:liu:diva-163032 (URN)10.1002/rob.21929 (DOI)000503992000001 ()
Note

Funding Agencies|Wallenberg AI, Autonomous Systems, and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Foundation for Strategic ResearchSwedish Foundation for Strategic Research [RIT 15-0097]; CENIIT grant [18.14]; VR starting grant [2016-05543]

Available from: 2020-01-09 Created: 2020-01-09 Last updated: 2022-10-28
4. Learning Fast and Robust Target Models for Video Object Segmentation
Open this publication in new window or tab >>Learning Fast and Robust Target Models for Video Object Segmentation
Show others...
2020 (English)In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 7404-7413, article id 9156406Conference paper, Published paper (Refereed)
Abstract [en]

Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.

Place, publisher, year, edition, pages
IEEE, 2020
Series
Computer Society Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Image segmentation;Robustness;Object segmentation;Adaptation models;Data models;Training;Target tracking
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-168133 (URN)10.1109/CVPR42600.2020.00743 (DOI)001309199900006 ()2-s2.0-85094324768 (Scopus ID)978-1-7281-7168-5 (ISBN)
Conference
Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13-19 June 2020
Available from: 2020-08-17 Created: 2020-08-17 Last updated: 2025-02-07
5. Distractor-aware video object segmentation
Open this publication in new window or tab >>Distractor-aware video object segmentation
2021 (English)In: Pattern Recognition. DAGM GCPR 2021, 2021, p. 222-234Conference paper, Published paper (Refereed)
Abstract [en]

Semi-supervised video object segmentation is a challenging task that aims to segment a target throughout a video sequence given an initial mask at the first frame. Discriminative approaches have demonstrated competitive performance on this task at a sensible complexity. These approaches typically formulate the problem as a one-versus-one classification between the target and the background. However, in reality, a video sequence usually encompasses a target, background, and possibly other distracting objects. Those objects increase the risk of introducing false positives, especially if they share visual similarities with the target. Therefore, it is more effective to separate distractors from the background, and handle them independently.

We propose a one-versus-many scheme to address this situation by separating distractors into their own class. This separation allows imposing special attention to challenging regions that are most likely to degrade the performance. We demonstrate the prominence of this formulation by modifying the learning-what-to-learn method to be distractor-aware. Our proposed approach sets a new state-of-the-art on the DAVIS val dataset, and improves over the baseline on the DAVIS test-dev benchmark by 4.8 percent points.

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-175117 (URN)10.1007/978-3-030-92659-5_14 (DOI)001500565200014 ()2-s2.0-85124271728 (Scopus ID)978-3-030-92658-8 (ISBN)978-3-030-92659-5 (ISBN)
Conference
German Conference on Pattern Recognition
Available from: 2021-04-19 Created: 2021-04-19 Last updated: 2025-10-10

Open Access in DiVA

fulltext(8412 kB)1559 downloads
File information
File name FULLTEXT01.pdfFile size 8412 kBChecksum SHA-512
8168eafac74d87dccfa00a89a05902114aa0b83ba91f424ea5ba38072c35ce3131c3ea51d1b8906c06e256964f52bfdd04cbd7a1c8bd5bdbf726a1b3e1f4a312
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Robinson, Andreas

Search in DiVA

By author/editor
Robinson, Andreas
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 1570 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1660 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf