liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Channel-Coded Feature Maps for Computer Vision and Machine Learning
Linköping University, Department of Electrical Engineering, Computer Vision . Linköping University, The Institute of Technology.
2008 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function.

The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated.

This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented.

Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration.

Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed.

All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system.

Place, publisher, year, edition, pages
Institutionen för systemteknik , 2008. , 155 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1160
Keyword [en]
computer vision, machine learning, object recognition, pose estimation
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-11040ISBN: 978-91-7393-988-1 (print)OAI: oai:DiVA.org:liu-11040DiVA: diva2:17496
Public defence
2008-03-28, Glashuset, Hus B, Campus Valla, Linköpings Universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Available from: 2008-02-20 Created: 2008-02-20 Last updated: 2016-05-04

Open Access in DiVA

cover(1811 kB)53 downloads
File information
File name COVER01.pdfFile size 1811 kBChecksum MD5
10e71c37ff3572f87b7d534ac8cff22026b4171a27f0251e321fff1ab778fa6af46d520b
Type coverMimetype application/pdf
fulltext(1468 kB)1091 downloads
File information
File name FULLTEXT01.pdfFile size 1468 kBChecksum MD5
da1b031b08ea2198d903e46018294321c495b4ca6d86dbfd662068591f15940f2b6fe7dd
Type fulltextMimetype application/pdf

Authority records BETA

Jonsson, Erik

Search in DiVA

By author/editor
Jonsson, Erik
By organisation
Computer Vision The Institute of Technology
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1091 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3523 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf