Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate). Additionally, our approach is capable of sub-pixel localization, crucial for the task of accurate feature point tracking. We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments.
This paper investigates the problem of position estimation of unmanned surface vessels (USVs) operating in coastal areas or in the archipelago. We propose a position estimation method where the horizon line is extracted in a 360 degree panoramic image around the USV. We design a CNN architecture to determine an approximate horizon line in the image and implicitly determine the camera orientation (the pitch and roll angles). The panoramic image is warped to compensate for the camera orientation and to generate an image from an approximately level camera. A second CNN architecture is designed to extract the pixelwise horizon line in the warped image. The extracted horizon line is correlated with digital elevation model (DEM) data in the Fourier domain using a MOSSE correlation filter. Finally, we determine the location of the maximum correlation score over the search area to estimate the position of the USV. Comprehensive experiments are performed in a field trial in the archipelago. Our approach provides promising results by achieving position estimates with GPS-level accuracy.
Estimating the position of a 3-dimensional world point given its 2-dimensional projections in a set of images is a key component in numerous computer vision systems. There are several methods dealing with this problem, ranging from sub-optimal, linear least square triangulation in two views, to finding the world point that minimized the L2-reprojection error in three views. This leads to the statistically optimal estimate under the assumption of Gaussian noise. In this paper we present a solution to the optimal triangulation in three views. The standard approach for solving the three-view triangulation problem is to find a closed-form solution. In contrast to this, we propose a new method based on an iterative scheme. The method is rigorously tested on both synthetic and real image data with corresponding ground truth, on a midrange desktop PC and a Raspberry Pi, a low-end mobile platform. We are able to improve the precision achieved by the closed-form solvers and reach a speed-up of two orders of magnitude compared to the current state-of-the-art solver. In numbers, this amounts to around 300K triangulations per second on the PC and 30K triangulations per second on Raspberry Pi.
The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment.
An increasing number of robots and autonomous vehicles are equipped with multiple cameras to achieve surround-view sensing. The estimation of their relative poses, also known as extrinsic parameter calibration, is a challenging problem, particularly in the non-overlapping case. We present a simple and novel extrinsic calibration method based on standard components that performs favorably to existing approaches. We further propose a framework for predicting the performance of different calibration configurations and intuitive error metrics. This makes selecting a good camera configuration straightforward. We evaluate on rendered synthetic images and show good results as measured by angular and absolute pose differences, as well as the reprojection error distributions.
Recent years have shown great progress in driving assistance systems, approaching autonomous driving step by step. Many approaches rely on lane markers however, which limits the system to larger paved roads and poses problems during winter. In this work we explore an alternative approach to visual road following based on online learning. The system learns the current visual appearance of the road while the vehicle is operated by a human. When driving onto a new type of road, the human driver will drive for a minute while the system learns. After training, the human driver can let go of the controls. The present work proposes a novel approach to online perception-action learning for the specific problem of road following, which makes interchangeably use of supervised learning (by demonstration), instantaneous reinforcement learning, and unsupervised learning (self-reinforcement learning). The proposed method, symbiotic online learning of associations and regression (SOLAR), extends previous work on qHebb-learning in three ways: priors are introduced to enforce mode selection and to drive learning towards particular goals, the qHebb-learning methods is complemented with a reinforcement variant, and a self-assessment method based on predictive coding is proposed. The SOLAR algorithm is compared to qHebb-learning and deep learning for the task of road following, implemented on a model RC-car. The system demonstrates an ability to learn to follow paved and gravel roads outdoors. Further, the system is evaluated in a controlled indoor environment which provides quantifiable results. The experiments show that the SOLAR algorithm results in autonomous capabilities that go beyond those of existing methods with respect to speed, accuracy, and functionality.