liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
Melnyk, P., Felsberg, M., Wadenbäck, M., Robinson, A. & Le, C. (2024). On Learning Deep O(n)-Equivariant Hyperspheres. In: Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix (Ed.), Proceedings of the 41st International Conference on Machine Learning: . Paper presented at 41st International Conference on Machine Learning, Vienna, Austria, 21-27 July 2024 (pp. 35324-35339). PMLR, 235
Open this publication in new window or tab >>On Learning Deep O(n)-Equivariant Hyperspheres
Show others...
2024 (English)In: Proceedings of the 41st International Conference on Machine Learning / [ed] Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix, PMLR , 2024, Vol. 235, p. 35324-35339Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In this paper, we utilize hyperspheres and regular n-simplexes and propose an approach to learning deep features equivariant under the transformations of nD reflections and rotations, encompassed by the powerful group of O(n). Namely, we propose O(n)-equivariant neurons with spherical decision surfaces that generalize to any dimension n, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in nD, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O(n)-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available on GitHub.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 235
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-206460 (URN)
Conference
41st International Conference on Machine Learning, Vienna, Austria, 21-27 July 2024
Available from: 2024-08-14 Created: 2024-08-14 Last updated: 2025-02-07
Melnyk, P., Robinson, A., Felsberg, M. & Wadenbäck, M. (2024). TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024: . Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16-22 June, 2024. (pp. 5620-5630). IEEE Computer Society
Open this publication in new window or tab >>TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
2024 (English)In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024, IEEE Computer Society, 2024, p. 5620-5630Conference paper, Published paper (Refereed)
Abstract [en]

In many practical applications, 3D point cloud analysis requires rotation invariance. In this paper, we present a learnable descriptor invariant under 3D rotations and reflections, i.e., the O(3) actions, utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we propose an embedding of the 3D spherical neurons into 4D vector neurons, which leverages end-to-end training of the model. In our approach, we perform TetraTransform--an equivariant embedding of the 3D input into 4D, constructed from the steerable neurons--and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally, TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space

Place, publisher, year, edition, pages
IEEE Computer Society, 2024
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-207318 (URN)10.1109/CVPR52733.2024.00537 (DOI)001322555906003 ()9798350353006 (ISBN)9798350353013 (ISBN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16-22 June, 2024.
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP); Swedish Research Council [2022-04266]; strategic research environment EL-LIIT

Available from: 2024-09-04 Created: 2024-09-04 Last updated: 2025-02-07
Zhang, Y., Robinson, A., Magnusson, M. & Felsberg, M. (2023). Leveraging Optical Flow Features for Higher Generalization Power in Video Object Segmentation. In: 2023 IEEEInternational Conferenceon Image Processing: Proceedings. Paper presented at 2023 IEEE International Conference on Image Processing (ICIP), 8–11 October 2023 Kuala Lumpur, Malaysia (pp. 326-330). IEEE
Open this publication in new window or tab >>Leveraging Optical Flow Features for Higher Generalization Power in Video Object Segmentation
2023 (English)In: 2023 IEEEInternational Conferenceon Image Processing: Proceedings, IEEE , 2023, p. 326-330Conference paper, Published paper (Refereed)
Abstract [en]

We propose to leverage optical flow features for higher generalization power in semi-supervised video object segmentation. Optical flow is usually exploited as additional guidance information in many computer vision tasks. However, its relevance in video object segmentation was mainly in unsupervised settings or using the optical flow to warp or refine the previously predicted masks. Different from the latter, we propose to directly leverage the optical flow features in the target representation. We show that this enriched representation improves the encoder-decoder approach to the segmentation task. A model to extract the combined information from the optical flow and the image is proposed, which is then used as input to the target model and the decoder network. Unlike previous methods, e.g. in tracking where concatenation is used to integrate information from image data and optical flow, a simple yet effective attention mechanism is exploited in our work. Experiments on DAVIS 2017 and YouTube-VOS 2019 show that integrating the information extracted from optical flow into the original image branch results in a strong performance gain, especially in unseen classes which demonstrates its higher generalization power.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Optical flow features; Attention mechanism; Semi-supervised VOS; Generalization power
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-199057 (URN)10.1109/ICIP49359.2023.10222542 (DOI)001106821000063 ()9781728198354 (ISBN)9781728198361 (ISBN)
Conference
2023 IEEE International Conference on Image Processing (ICIP), 8–11 October 2023 Kuala Lumpur, Malaysia
Available from: 2023-11-08 Created: 2023-11-08 Last updated: 2025-09-15
Robinson, A. (2021). Discriminative correlation filters in robot vision. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Discriminative correlation filters in robot vision
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In less than ten years, deep neural networks have evolved into all-encompassing tools in multiple areas of science and engineering, due to their almost unreasonable effectiveness in modeling complex real-world relationships. In computer vision in particular, they have taken tasks such as object recognition, that were previously considered very difficult, and transformed them into everyday practical tools. However, neural networks have to be trained with supercomputers on massive datasets for hours or days, and this limits their ability adjust to changing conditions.

This thesis explores discriminative correlation filters, originally intended for tracking large objects in video, so-called visual object tracking. Unlike neural networks, these filters are small and can be quickly adapted to changes, with minimal data and computing power. At the same time, they can take advantage of the computing infrastructure developed for neural networks and operate within them.

The main contributions in this thesis demonstrate the versatility and adaptability of correlation filters for various problems, while complementing the capabilities of deep neural networks. In the first problem, it is shown that when adopted to track small regions and points, they outperform the widely used Lucas-Kanade method, both in terms of robustness and precision. 

In the second problem, the correlation filters take on a completely new task. Here, they are used to tell different places apart, in a 16 by 16 square kilometer region of ocean near land. Given only a horizon profile - the coast line silhouette of islands and islets as seen from an ocean vessel - it is demonstrated that discriminative correlation filters can effectively distinguish between locations.

In the third problem, it is shown how correlation filters can be applied to video object segmentation. This is the task of classifying individual pixels as belonging either to a target or the background, given a segmentation mask provided with the first video frame as the only guidance. It is also shown that discriminative correlation filters and deep neural networks complement each other; where the neural network processes the input video in a content-agnostic way, the filters adapt to specific target objects. The joint function is a real-time video object segmentation method.

Finally, the segmentation method is extended beyond binary target/background classification to additionally consider distracting objects. This addresses the fundamental difficulty of coping with objects of similar appearance.

Abstract [sv]

På mindre än tio år har djupa neurala nätverk utvecklats till heltäckande verktyg inom flera vetenskapliga och tekniska områden på grund av deras nästan orimliga effektivitet när det gäller att modellera komplexa verkliga förhållanden. I synnerhet inom datorseende har de tagit uppgifter som objektigenkänning, som tidigare ansågs vara mycket svåra, och förvandlat dem till praktiska vardagliga verktyg. Neurala nätverk måste dock tränas med superdatorer på massiva datamängder i timmar eller dagar, och detta begränsar deras förmåga att anpassa sig till förändrade förhållanden.

Denna avhandling undersöker diskriminerande korrelationsfilter, ursprungligen avsedda för spårning av stora objekt i video, så kallad visual object tracking. Till skillnad från neurala nätverk är dessa filter små och kan snabbt anpassas till förändringar, med lite data och minimal datorkraft. Samtidigt kan de dra nytta av den infrastruktur som utvecklats för neurala nätverk och arbeta inom den.

De viktigaste bidragen i denna avhandling visar mångsidigheten och anpassningsförmågan hos korrelationsfilter för olika problem, samtidigt som de kompletterar kapaciteten hos djupa neurala nätverk. I det första problemet visas det att när de appliceras på att spåra små regioner och punkter, överträffar de den ofta använda Lucas-Kanade-metoden, både när det gäller robusthet och precision.

I det andra problemet appliceras korrelationsfiltren på en helt ny uppgift. Här används de för att skilja mellan olika platser i en 16 x 16 kvadratkilometer stor havsregion nära land, givet endast en horisontprofil - kustlinjens silhuett av öar och holmar sett från ett fartyg.

I det tredje problemet visas hur korrelationsfilter kan användas för segmentering av objekt i video. Detta är uppgiften att klassificera enskilda pixlar som tillhörande antingen ett målobjekt eller bakgrunden, givet en segmenteringsmask försedd med den första bildrutan som enda vägledning. Det visas också att diskriminerande korrelationsfilter och djupa neurala nätverk kompletterar varandra; där det neurala nätverket behandlar videon på ett innehålls-agnostiskt sätt, anpassar filtren sig till specifika målobjekt. Den sammansatta funktionen är en realtidsmetod för segmentering.

Slutligen utvidgas segmenteringsmetoden bortom binär mål- / bakgrundsklassificering till att dessutom beakta distraherande objekt. Detta adresserar den grundläggande svårigheten att hantera objekt som liknar varandra.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2021. p. 53
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2146
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-174939 (URN)10.3384/diss.diva-174939 (DOI)9789179296360 (ISBN)
Public defence
2021-06-14, Ada Lovelace, B-building, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Available from: 2021-05-17 Created: 2021-04-19 Last updated: 2025-02-07Bibliographically approved
Robinson, A., Eldesokey, A. & Felsberg, M. (2021). Distractor-aware video object segmentation. In: Pattern Recognition. DAGM GCPR 2021: . Paper presented at German Conference on Pattern Recognition (pp. 222-234).
Open this publication in new window or tab >>Distractor-aware video object segmentation
2021 (English)In: Pattern Recognition. DAGM GCPR 2021, 2021, p. 222-234Conference paper, Published paper (Refereed)
Abstract [en]

Semi-supervised video object segmentation is a challenging task that aims to segment a target throughout a video sequence given an initial mask at the first frame. Discriminative approaches have demonstrated competitive performance on this task at a sensible complexity. These approaches typically formulate the problem as a one-versus-one classification between the target and the background. However, in reality, a video sequence usually encompasses a target, background, and possibly other distracting objects. Those objects increase the risk of introducing false positives, especially if they share visual similarities with the target. Therefore, it is more effective to separate distractors from the background, and handle them independently.

We propose a one-versus-many scheme to address this situation by separating distractors into their own class. This separation allows imposing special attention to challenging regions that are most likely to degrade the performance. We demonstrate the prominence of this formulation by modifying the learning-what-to-learn method to be distractor-aware. Our proposed approach sets a new state-of-the-art on the DAVIS val dataset, and improves over the baseline on the DAVIS test-dev benchmark by 4.8 percent points.

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-175117 (URN)10.1007/978-3-030-92659-5_14 (DOI)001500565200014 ()2-s2.0-85124271728 (Scopus ID)978-3-030-92658-8 (ISBN)978-3-030-92659-5 (ISBN)
Conference
German Conference on Pattern Recognition
Available from: 2021-04-19 Created: 2021-04-19 Last updated: 2025-10-10
Robinson, A., Järemo-Lawin, F., Danelljan, M., Khan, F. S. & Felsberg, M. (2020). Learning Fast and Robust Target Models for Video Object Segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13-19 June 2020 (pp. 7404-7413). IEEE, Article ID 9156406.
Open this publication in new window or tab >>Learning Fast and Robust Target Models for Video Object Segmentation
Show others...
2020 (English)In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 7404-7413, article id 9156406Conference paper, Published paper (Refereed)
Abstract [en]

Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.

Place, publisher, year, edition, pages
IEEE, 2020
Series
Computer Society Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Image segmentation;Robustness;Object segmentation;Adaptation models;Data models;Training;Target tracking
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-168133 (URN)10.1109/CVPR42600.2020.00743 (DOI)001309199900006 ()2-s2.0-85094324768 (Scopus ID)978-1-7281-7168-5 (ISBN)
Conference
Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13-19 June 2020
Available from: 2020-08-17 Created: 2020-08-17 Last updated: 2025-02-07
Goutam, B., Järemo-Lawin, F., Danelljan, M., Robinson, A., Felsberg, M., Van Gool, L. & Timofte, R. (2020). Learning What to Learn for Video Object Segmentation. In: Vedaldi A., Bischof H., Brox T., Frahm JM (Ed.), Computer Vision: ECCV 2020 Workshop. Paper presented at European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020 (pp. 777-794).
Open this publication in new window or tab >>Learning What to Learn for Video Object Segmentation
Show others...
2020 (English)In: Computer Vision: ECCV 2020 Workshop / [ed] Vedaldi A., Bischof H., Brox T., Frahm JM, 2020, p. 777-794Conference paper, Published paper (Refereed)
Abstract [en]

Video object segmentation (VOS) is a highly challengingproblem, since the target object is only defined by a first-frame refer-ence mask during inference. The problem of how to capture and utilizethis limited information to accurately segment the target remains a fun-damental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shotlearner. Our learner is designed to predict a powerful parametric modelof the target by minimizing a segmentation error in the first frame. Wefurther go beyond the standard few-shot learning paradigm by learningwhat our target model should learn in order to maximize segmentationaccuracy. We perform extensive experiments on standard benchmarks.Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding toa 2.6% relative improvement over the previous best result. The code andmodels are available at https://github.com/visionml/pytracking.

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12347
National Category
Engineering and Technology Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-168716 (URN)10.1007/978-3-030-58536-5_46 (DOI)001500572000046 ()2-s2.0-85097234947 (Scopus ID)978-3-030-58535-8 (ISBN)978-3-030-58536-5 (ISBN)
Conference
European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020
Note

Funding agencies:vThis work was partly supported by the ETH Zürich Fund (OK), a Huawei Technologies Oy (Finland) project, an Amazon AWS grant, Nvidia, ELLIIT Excellence Center, the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the SSF project Symbicloud.

Available from: 2020-08-28 Created: 2020-08-28 Last updated: 2026-02-20
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., . . . Ma, Z. (2020). The Eighth Visual Object Tracking VOT2020 Challenge Results. In: Adrien Bartoli; Andrea Fusiello (Ed.), Computer Vision: ECCV 2020 Workshops, Glasgow, UK, August 23–28, 2020. Paper presented at ECCV 20 European Conference on Computer Vision (pp. 547-601). , 12539
Open this publication in new window or tab >>The Eighth Visual Object Tracking VOT2020 Challenge Results
Show others...
2020 (English)In: Computer Vision: ECCV 2020 Workshops, Glasgow, UK, August 23–28, 2020 / [ed] Adrien Bartoli; Andrea Fusiello, 2020, Vol. 12539, p. 547-601Conference paper, Published paper (Refereed)
Abstract [en]

The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net ). 

Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12539
Keywords
Depth; Long-term trackers; Performance evaluation protocol; RGB; RGBD; RGBT; Short-term trackers; State-of-the-art benchmark; Thermal imagery; Visual object tracking
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-179796 (URN)10.1007/978-3-030-68238-5_39 (DOI)2-s2.0-85101374294 (Scopus ID)9783030682378 (ISBN)
Conference
ECCV 20 European Conference on Computer Vision
Available from: 2021-10-02 Created: 2021-10-02 Last updated: 2025-02-07
Robinson, A., Järemo-Lawin, F., Danelljan, M. & Felsberg, M. (2019). Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation. In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation. Paper presented at The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Open this publication in new window or tab >>Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation
2019 (English)In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data

Keywords
video object segmentation, computer vision, machine learning
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-163334 (URN)
Conference
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Available from: 2020-02-01 Created: 2020-02-01 Last updated: 2025-02-07
Grelsson, B., Robinson, A., Felsberg, M. & Khan, F. S. (2018). HorizonNet for visual terrain navigation. In: Proceedings of 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS): . Paper presented at 2018 IEEE International Conference on Image Processing, Applications and Systems, December 12-14, 2018, Inria, Sophia Antipolis, France (pp. 149-155). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>HorizonNet for visual terrain navigation
2018 (English)In: Proceedings of 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 149-155Conference paper, Published paper (Refereed)
Abstract [en]

This paper investigates the problem of position estimation of unmanned surface vessels (USVs) operating in coastal areas or in the archipelago. We propose a position estimation method where the horizon line is extracted in a 360 degree panoramic image around the USV. We design a CNN architecture to determine an approximate horizon line in the image and implicitly determine the camera orientation (the pitch and roll angles). The panoramic image is warped to compensate for the camera orientation and to generate an image from an approximately level camera. A second CNN architecture is designed to extract the pixelwise horizon line in the warped image. The extracted horizon line is correlated with digital elevation model (DEM) data in the Fourier domain using a MOSSE correlation filter. Finally, we determine the location of the maximum correlation score over the search area to estimate the position of the USV. Comprehensive experiments are performed in a field trial in the archipelago. Our approach provides promising results by achieving position estimates with GPS-level accuracy.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
position estimation, horizon, registration, cnn, convolutional neural networks, cameras, correlation, Global Positioning System, sea measurements, digital elevation model, dem
National Category
Signal Processing Computer graphics and computer vision Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-161034 (URN)10.1109/IPAS.2018.8708868 (DOI)000471844500026 ()9781728102474 (ISBN)9781728102467 (ISBN)9781728102481 (ISBN)
Conference
2018 IEEE International Conference on Image Processing, Applications and Systems, December 12-14, 2018, Inria, Sophia Antipolis, France
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research, RIT 15-0097Swedish Research Council, 2016-05543
Note

Funding agencies:

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

This work was supported by the Swedish Foundation for Strategic Research (Smart Systems: RIT 15-0097).

This research is supported by CENIIT grant (18.14), and VR starting grant (2016-05543).

Available from: 2019-10-17 Created: 2019-10-17 Last updated: 2025-02-01Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9649-9592

Search in DiVA

Show all publications