liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Teaching Stereo Perception to YOUR Robot
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, The Institute of Technology.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, The Institute of Technology.ORCID iD: 0000-0002-5698-5983
2012 (English)Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

This paper describes a method for generation of dense stereo ground-truth using a consumer depth sensor such as the Microsoft Kinect. Such ground-truth allows adaptation of stereo algorithms to a specific setting. The method uses a novel residual weighting based on error propagation from image plane measurements to 3D. We use this ground-truth in wide-angle stereo learning by automatically tuning a novel extension of the best-first-propagation (BFP) dense correspondence algorithm. We extend BFP by adding a coarse-to-fine scheme, and a structure measure that limits propagation along linear structures and flat areas. The tuned correspondence algorithm is evaluated in terms of accuracy, robustness, and ability to generalise. Both the tuning cost function, and the evaluation are designed to balance the accuracy-robustness trade-off inherent in patch-based methods such as BFP.

Place, publisher, year, edition, pages
University of Surrey, UK , 2012. 1-12 p.
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-81312DOI: 10.5244/C.26.29ISI: 000346356200026ISBN: 1-901725-46-4 (print)OAI: oai:DiVA.org:liu-81312DiVA: diva2:551483
Conference
British Machine Vision Conference (BMVC12), Surrey, UK, 3-7 September
Available from: 2012-09-11 Created: 2012-09-11 Last updated: 2016-12-06Bibliographically approved
In thesis
1. Components of Embodied Visual Object Recognition: Object Perception and Learning on a Robotic Platform
Open this publication in new window or tab >>Components of Embodied Visual Object Recognition: Object Perception and Learning on a Robotic Platform
2013 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, and the implementation of the system itself. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. Finally, in order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. All of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2013. 64 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1607
Keyword
computer vision, object recognition, stereo vision, classification
National Category
Signal Processing Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-93812 (URN)978-91-7519-564-3 (ISBN)
Presentation
2013-08-16, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Projects
Embodied Visual Object Recognition
Funder
Swedish Research Council
Available from: 2013-07-09 Created: 2013-06-10 Last updated: 2015-12-10Bibliographically approved
2. Embodied Visual Object Recognition
Open this publication in new window or tab >>Embodied Visual Object Recognition
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Förkroppsligad objektigenkänning
Abstract [en]

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2017. 89 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1811
Keyword
object recognition, machine learning, computer vision
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-132762 (URN)10.3384/diss.diva-132762 (DOI)9789176856260 (ISBN)
Public defence
2017-01-20, Visionen, B-huset, Campus VAlla, Linköping, 13:00 (English)
Opponent
Supervisors
Projects
Embodied Visual Object RecognitionFaceTrack
Funder
Swedish Research Council, 2008-4509VINNOVA, 2013-00439EU, FP7, Seventh Framework Programme, 247947Linköpings universitet, LiU-foass
Available from: 2016-12-06 Created: 2016-11-23 Last updated: 2016-12-06Bibliographically approved

Open Access in DiVA

Teaching Stereo Perception to YOUR Robot(9319 kB)796 downloads
File information
File name FULLTEXT01.pdfFile size 9319 kBChecksum SHA-512
e38338431d20d5d2b2d8c4baef2118399c2a01e4139846a35ed3dbcaae52de2f0a09b07760ca99acb7d958d7f5c8cb8e80ab0ae2f25f12f53d62b0ba235d90c1
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Wallenberg, MarcusForssén, Per-Erik

Search in DiVA

By author/editor
Wallenberg, MarcusForssén, Per-Erik
By organisation
Computer VisionThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 796 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 472 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf