Channel-Coded Feature Maps for Computer Vision and Machine Learning
2008 (English)Doctoral thesis, monograph (Other academic)
This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function.
The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated.
This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented.
Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration.
Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed.
All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system.
Place, publisher, year, edition, pages
Institutionen för systemteknik , 2008. , 155 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1160
computer vision, machine learning, object recognition, pose estimation
Computer Vision and Robotics (Autonomous Systems)
IdentifiersURN: urn:nbn:se:liu:diva-11040ISBN: 978-91-7393-988-1OAI: oai:DiVA.org:liu-11040DiVA: diva2:17496
2008-03-28, Glashuset, Hus B, Campus Valla, Linköpings Universitet, Linköping, 13:15 (English)
Wörgötter, Florentin, Professor