Recently substantial research has been devoted to Unmanned Aerial Vehicles (UAVs). One of a UAV's most demanding subsystem is vision. The vision subsystem must dynamically combine different algorithms as the UAVs goal and surrounding change. To fully utilize the available hardware, a run time system must be able to vary the quality and the size of regions the algorithms are applied to, as the number of image processing tasks changes. To allow this the run time system and the underlying computational model must be integrated. In this paper we present a computational model suitable for integration with a run time system. The computational model is called Image Processing Data Flow Graph (IP-DFG). IP-DFG has been developed for modeling of complex image processing algorithms. IP-DFG is based on data flow graphs, but has been extended with hierarchy and new rules for token consumption, which makes the computational model more flexible and more suitable for human interaction. In this paper we also show that IP-DFGs are suitable for modelling expressions, including data dependent decisions and iterations, which are common in complex image processing algorithms.
WITAS will be engaged in goal-directed basic research in the area of intelligent autonomous vehicles and other autonomous systems. In this paper an overview of the project is given together with a presentation of our research interests in the project. The current status of our part in the project is also given.
In this paper we present a system which integrates computer vision and decision-making in an autonomous airborne vehicle that performs traffic surveillance tasks. The main factors that make the integration of vision and decision-making a challenging problem are: the qualitatively different kind of information at the decision-making and vision levels, the need for integration of dynamically acquired information with a priori knowledge, e.g. GIS information, and the need of close feedback and guidance of the vision module by the decision-making module. Given the complex interaction between the vision module and the decision-making module we propose the adoption of an intermediate structure, called Scene Information Manager, and describe its structure and functionalities.
The purpose of this paper is to provide a broad overview of the WITAS Unmanned Aerial Vehicle Project. The WITAS UAV project is an ambitious, long-term basic research project with the goal of developing technologies and functionalities necessary for the successful deployment of a fully autonomous UAV operating over diverse geographical terrain containing road and traffic networks. Theproject is multi-disciplinary in nature, requiring many different research competences, and covering a broad spectrum of basic research issues, many of which relate to current topics in artificial intelligence. A number of topics considered are knowledge representation issues, active vision systems and their integration with deliberative/reactive architectures, helicopter modeling and control, ground operator dialogue systems, actual physical platforms, and a number of simulation techniques.
One important problem within the WITAS project is detection of moving objects in aerial images. This paper presents an original method to estimate the displacement between two frames, based on multiscale local polynomial expansions of the images. When the displacement field has been computed, a plane + parallax approach is used to separate moving objects from the camera egomotion.
The WITAS Unmanned Aerial Vehicle Project is a long term basic research project located at Linköping University (LIU), Sweden. The project is multi-disciplinary in nature and involves cooperation with different departments at LIU, and a number of other universities in Europe, the USA, and South America. In addition to academic cooperation, the project involves collaboration with a number of private companies supplying products and expertise related to simulation tools and models, and the hardware and sensory platforms used for actual flight experimentation with the UAV. Currently, the project is in its second phase with an intended duration from 2000-2003.
This paper will begin with a brief overview of the project, but will focus primarily on the computer vision related issues associated with interpreting the operational environment which consists of traffic and road networks and vehicular patterns associated with these networks.
Panorama stitching of sparsely structured scenes is an open research problem. In this setting, feature-based image alignment methods often fail due to shortage of distinct image features. Instead, direct image alignment methods, such as those based on phase correlation, can be applied. In this paper we investigate correlation-based image alignment techniques for panorama stitching of sparsely structured scenes. We propose a novel image alignment approach based on discriminative correlation filters (DCF), which has recently been successfully applied to visual tracking. Two versions of the proposed DCF-based approach are evaluated on two real and one synthetic panorama dataset of sparsely structured indoor environments. All three datasets consist of images taken on a tripod rotating 360 degrees around the vertical axis through the optical center. We show that the proposed DCF-based methods outperform phase correlation-based approaches on these datasets.
This Chapter presents a vision-based method for unmanned aerial vehicle (UAV) motion estimation that uses as input an image motion field obtained from matches of point-like features. The Chapter enhances visionbased techniques developed for single UAV localization and demonstrates how they can be modified to deal with the problem of multi-UAV relative position estimation. The proposed approach is built upon the assumption that if different UAVs identify, using their cameras, common objects in a scene, the relative pose displacement between the UAVs can be computed from these correspondences under certain assumptions. However, although point-like features are suitable for local UAV motion estimation, finding matches between images collected using different cameras is a difficult task that may be overcome using blob features. Results justify the proposed approach.
This paper describes a method for vision-based unmanned aerial vehicle (UAV) motion estimation from multiple planar homographies. The paper also describes the determination of the relative displacement between different UAVs employing techniques for blob feature extraction and matching. It then presents and shows experimental results of the application of the proposed technique to multi-UAV detection of forest fires.
This report describes a fourth order tensor defined on projective spaces which can be used for the representation of medium-level features, e.g., one or more oriented segments. The tensor has one part which describes what type of local structures are present in a region, and one part which describes where they are located. This information can be used, e.g., to represent multiple orientations, corners, and line-endings. The tensor can be defined for arbitrary signal dimension, but the presentation focuses on the properties of the fourth order tensor for the case of 2D and 3D image data. A method for estimating the proposed tensor representation by means of simple computations directly from the structure tensor is presented. Given a simple matrix representation of the tensor, it can be shown that there is a direct correspondence between the number of oriented segments and the rank of the matrix provided that the number of segments is three or less. The \publication also presents techniques for extracting information about the oriented segments which the tensor represent. Finally, it shown that a small set of coefficients can be computed from the proposed tensor which are invariant to changes of the coordinate system.
A novel and computationally simple method is presented for triangulation of 3D points corresponding to the image coordinates in a pair of stereo images. The image points are described in terms of homogeneous coordinates which are jointly represented as the outer products of these homogeneous coordinates. This paper derives a linear transformation which maps the joint representation directly to the homogeneous representation of the corresponding 3D point in the scene. Compared to the other triangulation methods this approach gives similar reconstruction error but is numerically faster, since it only requires linear operations. The proposed method is projective invariant in the same way as the optimal method of Hartley and Sturm. The methods has a "blind plane"; a plane through the camera focal points which cannot be reconstructed by this method. For "forward-looking" camera configurations, however, the blind plane can be placed outside the visible scene and does not constitute a problem.
The paper describes a minimal set of 18 parameters that can representany trifocal tensor consistent with the internal constraints. 9parameters describe three orthogonal matrices and 9 parameters describe10 elements of a sparse tensor T' with 17 elements in well-defined positions equal to zero. Any valid trifocal tensor isthen given as some specific T' transformed by the orthogonalmatrices in the respective image domain. The paper also describes asimple approach for estimating the three orthogonal matrices in thecase of a general 3 x 3 x 3 tensor, i.e., when the internalconstraints are not satisfied. This can be used to accomplish a leastsquares approximation of a general tensor to a tensor that satisfies the internal constraints. This type of constraint enforcement, inturn, can be used to obtain an improved estimate of the trifocal tensorbased on the normalized linear algorithm, with the constraintenforcement as a final step. This makes the algorithm more similar tothe corresponding algorithm for estimation of the fundamental matrix. An experiment on synthetic data shows that the constraint enforcementof the trifocal tensor produces a significantly better result thanwithout enforcement, expressed by the positions of the epipoles, giventhat the constraint enforcement is made in normalized image coordinates.
Triangulation of a 3D point from two or more views can be solved inseveral ways depending on how perturbations in the image coordinatesare dealt with. A common approach is optimal triangulation which minimizes the total L_{2} reprojection error in the images,corresponding to finding a maximum likelihood estimate of the 3Dpoint assuming independent Gaussian noise in the image spaces.Computational approaches for optimal triangulation have beenpublished for the stereo case and, recently, also for the three-viewcase. In short, they solve an independent optimization problem foreach 3D point, using relatively complex computations such as findingroots of high order polynomials or matrix decompositions. Thispaper discuss three-view triangulation and reports the followingresults: (1) the 3D point can be computed as multi-linear mapping(tensor) applied on the homogeneous image coordinates, (2) the set of triangulation tensors forms a 7-dimensional space determined by the camera matrices, (3) given a set of corresponding 3D/2Dcalibration points, the 3D residual L_{1} errors can be optimized over the elements in the 7-dimensional space, (4) using theresulting tensor as initial value, the error can be further reducedby tuning the tensor in a two-step iterative process, (5) the 3Dresidual L_{1} error for a set of evaluation points which lieclose to the calibration set is comparable to the three-view optimalmethod. In summary, three-view triangulation can be done by firstperforming an optimization of the triangulation tensor and once this is done, triangulation can be made with 3D residual error at thesame level as the optimal method, but at a much lower computationalcost. This makes the proposed method attractive for real-time three-view triangulation of large data sets provided that thenecessary calibration process can be performed.
This paper presents a method for triangulation of 3D points given their projections in two images. Recent results show that the triangulation mapping can be represented as a linear operator K applied to the outer product of corresponding homogeneous image coordinates, leading to a triangulation of very low computational complexity. K can be determined from the camera matrices, together with a so-called blind plane, but we show here that it can be further refined by a process similar to Gold Standard methods for camera matrix estimation. In particular it is demonstrated that K can be adjusted to minimize the Euclidean L, residual 3D error, bringing it down to the same level as the optimal triangulation by Hartley and Sturm. The resulting K optimally fits a set of 2D+2D+3D data where the error is measured in the 3D space. Assuming that this calibration set is representative for a particular application, where later only the 2D points are known, this K can be used for triangulation of 3D points in an optimal way, which in addition is very efficient since the optimization need only be made once for the point set. The refinement of K is made by iteratively reducing errors in the 3D and 2D domains, respectively. Experiments on real data suggests that very few iterations are needed to accomplish useful results.
This chapter is on Fourier methods, with a particularemphasis on definitions and theorems essential to the understanding offiltering procedures in multi-dimensional spaces. This is a centralissue in computer vision.
This thesis presents a signal representation in terms of operators. The signal is assumed to be an element of a vector space and subject to transformations of operators. The operators form continuous groups, so-called Lie groups. The representation can be used for signals in general, in particular if spatial relations are undefinied and it does not require a basis of the signal space to be useful.
Special attention is given to orthogonal operator groups which are generated by anti-Hermitian operators by means of the exponential mapping. It is shown that the eigensystem of the group generator is strongly related to properties of the corresponding operator group. For one-parameter orthogonal operator groups, a phase concept is introduced. This phase can for instance be used to distinguish between spatially even and odd signals and, therefore, corresponds to the usual phase for multi-dimensional signals.
Given one operator group that represents the variation of the signal and one operator group that represents the variation of a corresponding feature descriptor, an equivariant mapping maps the signal to the descriptor such that the two operator groups correspond. Suficient conditions are derived for a general mapping to be equivariant with respect to a pair of operator groups. These conditions are expressed in terms of the generators of the two operator groups. As a special case, second order homo-geneous mappings are considered, and examples of how second order mappings can be used to obtain different types of feature descriptors are presented, in particular for operator groups that are homomorphic to rotations in two and three dimensions, respectively. A generalization of directed quadrature lters is made. All feature extraction algorithms that are presented are discussed in terms of phase invariance.
Simple procedures that estimate group generators which correspond to one-parameter groups are derived and tested on an example. The resulting generator is evaluated by using its eigensystem in implementations of two feature extraction algorithms. It is shown that the resulting feature descriptor has good accuracy with respect to the corresponding feature value, even in the presence of signal noise.
The topic of this report is signal representation in the context of hierarchical image processing. An overview of hierarchical processing systems is included as well as a presentation of various approaches to signal representation, feature representation and feature extraction. It is claimed that image hierarchies based on feature extraction, so called feature hierarchies, demand a signal representation other than the standard spatial or linear representation used today. A new representation, the operator representation is developed. It is based on an interpretation of features in terms of signal transformations. This representation has no references to any spatial ordering of the signal element and also gives an explicit representation of signal features. Using the operator representation, a generalization of the standard phase concept in image processing is introduced. Based on the operator representation, two algorithms for extraction of feature values are presented. Both have the capability of generating phase invariant feature descriptors. It is claimed that the operator representation in conjunction with some appropriate feature extraction algorithm is well suited as a general framework for defining multi level feature hierarchies. The report contains an appendical chapter containing the mathematical details necessary to comprehend the presentation.
A single-view matching constraint is described which represents a necessary condition which 6 points in an image must satisfy if they are the images of 6 known 3D points under an arbitrary projective transformation. Similar to the well-known matching constrains for two or more view, represented by fundamental matrices or trifocal tensors, single-view matching constrains are represented by tensors and when multiplied with the homogeneous image coordinates the result vanishes when the condition is satisfied. More precisely, they are represented by 6-th order tensors on ℝ^{3} which can be computed in a simple manner from the camera projection matrix and the 6 3D points. The single-view matching constraints can be used for finding correspondences between detected 2D feature points and known 3D points, e.g., on an object, which are observed from arbitrary views. Consequently, this type of constraint can be said to be a representation of 3D shape (in the form of a point set) which is invariant to projective transformations when projected onto a 2D image.
In this article we describe a set of canonical transformations of the image spaces that make the description of three-view geometry very simple. The transformations depend on the three-view geometry and the canonically transformed trifocal tensor T' takes the form of a sparse array where 17 elements in well-defined positions are zero, it has a linear relation to the camera matrices and to two of the fundamental matrices, a third order relation to the third fundamental matrix, a second order relation to the other two trifocal tensors, and first order relations to the 10 three-view all-point matching constraints. In this canonical form, it is also simple to determine if the corresponding camera configuration is degenerate or co-linear. An important property of the three canonical transformations of the images spaces is that they are in SO(3). The 9 parameters needed to determine these transformations and the 9 parameters that determine the elements of T' together provide a minimal parameterization of the tensor. It does not have problems with multiple maps or multiple solutions that other parameterizations have, and is therefore simple to use. It also provides an implicit representation of the trifocal internal constraints: the sparse canonical representation of the trifocal tensor can be determined if and only if it is consistent with its internal constraints. In the non-ideal case, the canonical transformation can be determined by solving a minimization problem and a simple algorithm for determining the solution is provided. This allows us to extend the standard linear method for estimation of the trifocal tensor to include a constraint enforcement as a final step, similar to the constraint enforcement of the fundamental matrix.
Experimental evaluation of this extended linear estimation method shows that it significantly reduces the geometric error of the resulting tensor, but on average the algebraic estimation method is even better. For a small percentage of cases, however, the extended linear method gives a smaller geometric error, implying that it can be used as a complement to the algebraic method for these cases.
The structure tensor has been used mainly for representation of local orientation in spaces of arbitrary dimensions, where the eigenvectors represent the orientation and the corresponding eigenvalues indicate the type of structure which is represented. Apart from being local, the structure tensor may be referred to as "object centered" since it describes the corresponding structure relative to a local reference system. This paper proposes that the basic properties of the structure tensor can be extended to a tensor defined in a projective space rather than in a local Euclidean space. The result, the "projective tensor", is symmetric in the same way as the structure tensor, and also uses the eigensystem to carry the relevant information. However, instead of orientation, the projective tensor represents geometrical primitives such as points, lines, and planes (depending on dimensionality of the underlying space). Furthermore, this representation has the useful property of mapping the operation of forming the affine hull of points and lines to tensor summation, e.g., the sum of two projective tensors which represent two points amounts to a projective tensor that represent the line which passes through the two points, etc. The projective tensor may be referred to as "view centered" since each tensor, which still may be defined on a local scale, represents a geometric primitive relative to a global image based reference system. This implies that two such tensors may be combined, e.g., using summation, in a meaningful way over large regions.
This article presents a computationally efficient approach to the triangulation of 3D points from their projections in two views. The homogenous coordinates of a 3D point is given as a multi-linear mapping on its homogeneous image coordinates, a computation of low computational complexity. The multi-linear mapping is a tensor, and an element of a projective space, that can be computed directly from the camera matrices and some parameters. These parameters imply that the tensor is not unique: for a given camera pair the subspace K of triangulation tensors is six-dimensional. The triangulation tensor is 3D projective covariant and satisfies a set of internal constraints. Reconstruction of 3D points using the proposed tensor is studied for the non-ideal case, when the image coordinates are perturbed by noise and the epipolar constraint exactly is not satisfied exactly. A particular tensor of K is then the optimal choice for a simple reduction of 3D errors, and we present a computationally efficient approach for determining this tensor. This approach implies that normalizing image coordinate transformations are important for obtaining small 3D errors.
In addition to computing the tensor from the cameras, we also investigate how it can be further optimized relative to error measures in the 3D and 2D spaces. This optimization is evaluated for sets of real 3D + 2D + 2D data by comparing the reconstruction to some of the triangulation methods found in the literature, in particular the so-called optimal method that minimizes 2D L_{2} errors. The general conclusion is that, depending on the choice of error measure and the optimization implementation, it is possible to find a tensor that produces smaller 3D errors (both L_{1} and L_{2}) but slightly larger 2D errors than the optimal method does. Alternatively, we may find a tensor that gives approximately comparable results to the optimal method in terms of both 3D and 2D errors. This means that the proposed tensor based method of triangulation is both computationally efficient and can be calibrated to produce small reconstruction or reprojection errors for a given data set.
This paper presents and overview of the basic and applied research carried out by the Computer Vision Laboratory, Linköping University, in the WITAS UAV Project. This work includes customizing and redesigning vision methods to fit the particular needs and restrictions imposed by the UAV platform, e.g., for low-level vision, motion estimation, navigation, and tracking. It also includes a new learning structure for association of perception-action activations, and a runtime system for implementation and execution of vision algorithms. The paper contains also a brief introduction to the WITAS UAV Project.
A runtime system for implementation of image processing operations is presented. It is designed for working in a flexible and distributed environment related to the software architecture of a newly developed UAV system. The software architecture can be characterized at a coarse scale as a layered system, with a deliberative layer at the top, a reactive layer in the middle, and a processing layer at the bottom. At a finer scale each of the three levels is decomposed into sets of modules which communicate using CORBA, allowing system development and deployment on the UAV to be made in a highly flexible way. Image processing takes place in a dedicated module located in the process layer, and is the main focus of the paper. This module has been designed as a runtime system for data flow graphs, allowing various processing operations to be created online and on demand by the higher levels of the system. The runtime system is implemented in Java, which allows development and deployment to be made on a wide range of hardware/software configurations. Optimizations for particular hardware platforms have been made using Java's native interface.
The paper makes a short presentation of three existing methods for estimation of orientation tensors, the so-called structure tensor, quadrature filter based techniques, and techniques based on approximating a local polynomial model. All three methods can be used for estimating an orientation tensor which in the 3D case can be used for motion estimation. The methods are based on rather different approaches in terms of the underlying signal models. However, they produce more or less similar results which indicates that there should be a common framework for estimation of the tensors. Such a framework is proposed, in terms of a second order mapping from signal to tensor with additional conditions on the mapping. It it also shown that the three methods in principle fall into this framework.
Tensors have become a popular tool for representation of local orientation and can be used also for estimation of velocity. A number of computational approaches have been presented for tensor estimation which, however, are difficult to analyze or compare since there has been no common framework in which analysis or comparisons can be made. In this article, we propose such a framework based on second-order filters and show how it applies to three different methods for tensor estimation. The framework contains a few conditions on the filters which are sufficient to obtain correctly oriented rank one tensors for the case of simple signals. It also allows the derivation of explicit expressions for the variation of the tensor across oriented structures which, e.g., can be used to formulate conditions for phase invariance. (c) 2005 Elsevier B.V. All rights reserved.
This report defines the rank complement of a diagonalizable matrix (i.e. a matrix which can be brought to a diagonal form by means of a change of basis) as the interchange of the range and the null space. Given a diagonalizable matrix A there is in general no unique matrix Ac which has a range equal to the null space of A and a null space equal to the range of A, only matrices of full rank have a unique rank complement; the zero matrix. Consequently, the rank complement operation is not a distinct operation, but rather a characterization of any operation which makes an interchange of the range and the null space. One particular rank complement operation is introduced here, which eventually leads to an implementation of rank complement operations in terms of polynomials in A. The main result is that for each possible rank r of A there is a polynomial in A which evaluates to a matrix Ac which is a rank complement of A. The report provides explicit expressions for matrix polynomials which compute a rank complement of a symmetric matrix. These results are then generalized to the case of diagonalizable matrices. Finally, a Matlab function is described that implements a rank complement operation based on the results derived.
A robust, fast and general method for estimation of object properties is proposed. It is based on a representation of theses properties in terms of channels. Each channel represents a particular value of a property, resembling the activity of biological neurons. Furthermore, each processing unit, corresponding to an artificial neuron, is a linear perceptron which operates on outer products of input data. This implies a more complex space of invariances than in the case of first order characteristic without abandoning linear theory. In general, the specific function of each processing unit has to to be learned and a fast and simple learning rule is presented. The channel representation, the processing structure and the learning rule has been tested on stereo image data showing a cube with various 3D positions and orientations. The system was able to learn a channel representation for the horizontal position, the depth, and the orientation of the cube, each property invariant to the other two.
This paper presents an algorithm for estimation of local curvature from gradients of a tensor field that represents local orientation. The algorithm is based on an operator representation of the orientation tensor, which means that change of local orientation corresponds to a rotation of the eigenvectors of the tensor. The resulting curvature descriptor is a vector that points in the direction of the image in which the local orientation rotates anti-clockwise and the norm of the vector is the inverse of the radius of curvature. Two coefficients are defined that relate the change of local orientation with either curves or radial patterns.
The tensor representation has proven a successful tool as a mean to describe local multi-dimensional orientation. In this respect, the tensor representation is a map from the local orientation to a second order tensor. This paper investigates how variations of the orientation are mapped to variation of the tensor, thereby giving an explicit equivariance relation. The results may be used in order to design tensor based algorithms for extraction of image features defined in terms of local variations of the orientation, \eg multi-dimensional curvature or circular symmetries. It is assumed that the variation of the local orientation can be described in terms of an orthogonal transformation group. Under this assumption a corresponding orthogonal transformation group, acting on the tensor, is constructed. Several correspondences between the two groups are demonstrated.
The tensor representation has proven a successful tool as a mean to describe local multi-dimensional orientation. In this respect, the tensor representation is a map from the local orientation to a second order tensor. This paper investigates how variations of the orientation are mapped to variation of the tensor, thereby giving an explicit equivariance relation. The results may be used in order to design tensor based algorithms for extraction of image features defined in terms of local variations of the orientation, e.g. multi-dimensional curvature or circular symmetries. It is assumed that the variation of the local orientation can be described in terms of an orthogonal transformation group. Under this assumption a corresponding orthogonal transformation group, acting on the tensor, is constructed. Several correspondences between the two groups are demonstrated.
The paper describes a method for extracting point features from an image, corresponding to corners and crossings of lines. The method is based on a fourth order tensor representation which can describe the parameters of a local pair of line segments. By considering the rank of the tensor, it is possible to find points which correspond to corners, crossings or junctions. These points can then be further analyzed to provide detailed information about the configuration of the segments. The proposed method is intended for features which can be used for estimation of position and pose of 3D objects, e.g., for the purpose of grasping.
This paper describes a method for extracting point features from an image, corresponding to corners and crossings of lines. The method is based on a local estimation of a 6 x 6 tensor which describes the parameters of a pair of line segments. By considering the rank of the tensor, it is possible to find points of interest. These points can then be further analyzed to provide detailed information about the configuration of the segments. The proposed method is intended for features which can be used for estimation of position and pose of 3D objects, e.g., for the purpose of grasping.
This document is an addendum to the main text in A local geometry-based descriptor for 3D data applied to object pose estimation by Fredrik Viksten and Klas Nordberg. This addendum gives proofs for propositions stated in the main document. This addendum also details how to extract information from the fourth order tensor refered to as S_{22} in the main document.
A novel method for estimating a second order scene tensor is described and results using that method on a synthetic image sequence are shown. It is shown that the tensors can be used to represent basic geometrical entities. A short discussion on what work needs to be done to extend the tensorial description here in to a framework of pose estimation is found at the end of the report.
This paper presents a novel representation for 3D shapes in terms of planar surface patches and their boundaries. The representation is based on a tensor formalism similar to the usual orientation tensor but extends this concept by using projective spaces and a fourth order tensor, even though the practical computations can be made in normal matrix algebra. This paper also discusses the possibility of estimating the proposed representation from motion field which are generated by a calibrated camera moving in the scene. One method based on 3D spatio-temporal orientation tensors is presented and results from this method are included.
We propose a method for segmenting an arbitrary number of moving objects using the geometry of 6 points in 2D images to infer motion consistency. This geometry allows us to determine whether or not observations of 6 points over several frames are consistent with a rigid 3D motion. The matching between observations of the 6 points and an estimated model of their configuration in 3D space, is quantified in terms of a geometric error derived from distances between the points and 6 corresponding lines in the image. This leads to a simple motion inconsistency score, based on the geometric errors of 6points that in the ideal case should be zero when the motion of the points can be explained by a rigid 3D motion. Initial point clusters are determined in the spatial domain and merged in motion trajectory domain based on this score. Each point is then assigned to the cluster, which gives the lowest score.Our algorithm has been tested with real image sequences from the Hopkins155 database with very good results, competing withthe state of the art methods, particularly for degenerate motion sequences. In contrast to the motion segmentation methods basedon multi-body factorization, that assume an affine camera model, the proposed method allows the mapping from 3D space to the 2D image to be fully projective.
We present Lambda Twist; a novel P3P solver which is accurate, fast and robust. Current state-of-the-art P3P solvers find all roots to a quartic and discard geometrically invalid and duplicate solutions in a post-processing step. Instead of solving a quartic, the proposed P3P solver exploits the underlying elliptic equations which can be solved by a fast and numerically accurate diagonalization. This diagonalization requires a single real root of a cubic which is then used to find the, up to four, P3P solutions. Unlike the direct quartic solvers our method never computes geometrically invalid or duplicate solutions.
Extensive evaluation on synthetic data shows that the new solver has better numerical accuracy and is faster compared to the state-of-the-art P3P implementations. Implementation and benchmark are available on github.
An open issue in multiple view geometry and structure from motion, applied to real life scenarios, is the sparsity of the matched key-points and of the reconstructed point cloud. We present an approach that can significantly improve the density of measured displacement vectors in a sparse matching or tracking setting, exploiting the partial information of the motion field provided by linear oriented image patches (edgels). Our approach assumes that the epipolar geometry of an image pair already has been computed, either in an earlier feature-based matching step, or by a robustified differential tracker. We exploit key-points of a lower order, edgels, which cannot provide a unique 2D matching, but can be employed if a constraint on the motion is already given. We present a method to extract edgels, which can be effectively tracked given a known camera motion scenario, and show how a constrained version of the Lucas-Kanade tracking procedure can efficiently exploit epipolar geometry to reduce the classical KLT optimization to a 1D search problem. The potential of the proposed methods is shown by experiments performed on real driving sequences.
To summarize, the VISATEC project was initiated to combine the specific scientific competencies of the research groups at CAU and LiU, together with the industrial view on vision applications, in order to develop novel, more robust algorithms for object localization and recognition. This goal was achieved by a two-fold strategy, whereby on the one hand more robust basic algorithms were developed and on the other hand a method for the combination of these algorithms was devised. In particular, the latter confirmed the consortium’s belief that an appropriate combination of a number of basic algorithms will lead to more robust results than a single method could do.
However, the multi-cue integration is just one algorithm of many that were developed in the VISATEC project. All developed algorithms are described in some detail in the remainder of this report. An overview of the respective publications can be found in appendix.
Despite some difficulties that were encountered on the way, we as a consortium feel that the VISATEC project was a success. That this is not only our opinion reflects in the outcome of the final review. We believe that the work that was done during these three years of the project, not only furthered our understanding of the matter, but also added to the knowledge within the scientific community and showed new possibilities for industrial vision applications.
This paper describes a novel compact representation of local features called the tensor doublet. The representation generates a four dimensional feature vector which is significantly less complex than other approaches, such as Lowe's 128 dimensional feature vector. Despite its low dimensionality, we demonstrate here that the tensor doublet can be used for pose estimation, where the system is trained for an object and evaluated on images with cluttered background and occlusion.
We present a novel local descriptor for range data that can describe one or more planes or lines in a local region. It is possible to recover the geometry of the described local region and extract the size, position and orientation of each local plane or line-like structure from the descriptor. This gives the descriptor a property that other popular local descriptors for range data, such as spin images or point signatures, does not have. The estimation of the descriptor is dependant on estimation of surface normals but does not depend on the specific normal estimation method used. It is shown that is possible to extract how many planar surface regions the descriptor represents and that this could be used as a point-of-interest detector.