liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Embodied Visual Object Recognition
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
2017 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Förkroppsligad objektigenkänning (Swedish)
Abstract [en]

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2017. , 89 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1811
Keyword [en]
object recognition, machine learning, computer vision
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-132762DOI: 10.3384/diss.diva-132762ISBN: 9789176856260 (print)OAI: oai:DiVA.org:liu-132762DiVA: diva2:1049161
Public defence
2017-01-20, Visionen, B-huset, Campus VAlla, Linköping, 13:00 (English)
Opponent
Supervisors
Projects
Embodied Visual Object RecognitionFaceTrack
Funder
Swedish Research Council, 2008-4509VINNOVA, 2013-00439EU, FP7, Seventh Framework Programme, 247947Linköpings universitet, LiU-foass
Available from: 2016-12-06 Created: 2016-11-23 Last updated: 2016-12-06Bibliographically approved
List of papers
1. A Research Platform for Embodied Visual Object Recognition
Open this publication in new window or tab >>A Research Platform for Embodied Visual Object Recognition
2010 (English)In: Proceedings of SSBA 2010 Symposium on Image Analysis / [ed] Hendriks Luengo and Milan Gavrilovic, 2010, 137-140 p.Conference paper, Published paper (Other academic)
Abstract [en]

We present in this paper a research platform for development and evaluation of embodied visual object recognition strategies. The platform uses a stereoscopic peripheral-foveal camera system and a fast pan-tilt unit to perform saliency-based visual search. This is combined with a classification framework based on the bag-of-features paradigm with the aim of targeting, classifying and recognising objects. Interaction with the system is done via typed commands and speech synthesis. We also report the current classification performance of the system.

Series
Centre for Image Analysis Report Series, ISSN 1100-6641 ; 34
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-70769 (URN)
Conference
SSBA 2010, Uppsala, Sweden, 11-12 March 2010
Available from: 2011-09-16 Created: 2011-09-16 Last updated: 2016-11-23Bibliographically approved
2. Embodied Object Recognition using Adaptive Target Observations
Open this publication in new window or tab >>Embodied Object Recognition using Adaptive Target Observations
2010 (English)In: Cognitive Computation, ISSN 1866-9956, E-ISSN 1866-9964, Vol. 2, no 4, 316-325 p.Article in journal (Refereed) Published
Abstract [en]

In this paper, we study object recognition in the embodied setting. More specifically, we study the problem of whether the recognition system will benefit from acquiring another observation of the object under study, or whether it is time to give up, and report the observed object as unknown. We describe the hardware and software of a system that implements recognition and object permanence as two nested perception-action cycles. We have collected three data sets of observation sequences that allow us to perform controlled evaluation of the system behavior. Our recognition system uses a KNN classifier with bag-of-features prototypes. For this classifier, we have designed and compared three different uncertainty measures for target observation. These measures allow the system to (a) decide whether to continue to observe an object or to move on, and to (b) decide whether the observed object is previously seen or novel. The system is able to successfully reject all novel objects as “unknown”, while still recognizing most of the previously seen objects.

Place, publisher, year, edition, pages
Springer, 2010
Keyword
Object recognition - Attention - Visual search - Fixation - Object permanence
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-63344 (URN)10.1007/s12559-010-9079-7 (DOI)000292777400011 ()
Note

The original publication is available at www.springerlink.com: Marcus Wallenberg and Per-Erik Forssén, Embodied Object Recognition using Adaptive Target Observations, 2010, Cognitive Computation, (2), 4, 316-325. http://dx.doi.org/10.1007/s12559-010-9079-7 Copyright: Springer Science Business Media http://www.springerlink.com/

Available from: 2010-12-16 Created: 2010-12-16 Last updated: 2016-12-06Bibliographically approved
3. Channel Coding for Joint Colour and Depth Segmentation
Open this publication in new window or tab >>Channel Coding for Joint Colour and Depth Segmentation
2011 (English)In: Proceedings of  Pattern Recognition 33rd DAGM Symposium, Frankfurt/Main, Germany, August 31 - September 2 / [ed] Rudolf Mester and Michael Felsberg, Springer, 2011, 306-315 p.Conference paper, Published paper (Refereed)
Abstract [en]

Segmentation is an important preprocessing step in many applications. Compared to colour segmentation, fusion of colour and depth greatly improves the segmentation result. Such a fusion is easy to do by stacking measurements in different value dimensions, but there are better ways. In this paper we perform fusion using the channel representation, and demonstrate how a state-of-the-art segmentation algorithm can be modified to use channel values as inputs. We evaluate segmentation results on data collected using the Microsoft Kinect peripheral for Xbox 360, using the superparamagnetic clustering algorithm. Our experiments show that depth gradients are more useful than depth values for segmentation, and that channel coding both colour and depth gradients makes tuned parameter settings generalise better to novel images.

Place, publisher, year, edition, pages
Springer, 2011
Series
Lecture Notes in Computer Science, ISSN 0302-9743 (print), 1611-3349 (online) ; 6835
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-70708 (URN)10.1007/978-3-642-23123-0_31 (DOI)978-3-642-23122-3 (ISBN)
Conference
33rd DAGM Symposium, Frankfurt 29 August - 2 September 2011
Available from: 2011-09-15 Created: 2011-09-15 Last updated: 2016-12-06Bibliographically approved
4. Teaching Stereo Perception to YOUR Robot
Open this publication in new window or tab >>Teaching Stereo Perception to YOUR Robot
2012 (English)Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

This paper describes a method for generation of dense stereo ground-truth using a consumer depth sensor such as the Microsoft Kinect. Such ground-truth allows adaptation of stereo algorithms to a specific setting. The method uses a novel residual weighting based on error propagation from image plane measurements to 3D. We use this ground-truth in wide-angle stereo learning by automatically tuning a novel extension of the best-first-propagation (BFP) dense correspondence algorithm. We extend BFP by adding a coarse-to-fine scheme, and a structure measure that limits propagation along linear structures and flat areas. The tuned correspondence algorithm is evaluated in terms of accuracy, robustness, and ability to generalise. Both the tuning cost function, and the evaluation are designed to balance the accuracy-robustness trade-off inherent in patch-based methods such as BFP.

Place, publisher, year, edition, pages
University of Surrey, UK, 2012
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-81312 (URN)10.5244/C.26.29 (DOI)000346356200026 ()1-901725-46-4 (ISBN)
Conference
British Machine Vision Conference (BMVC12), Surrey, UK, 3-7 September
Available from: 2012-09-11 Created: 2012-09-11 Last updated: 2016-12-06Bibliographically approved

Open Access in DiVA

Embodied Visual Object Recognition(2833 kB)157 downloads
File information
File name FULLTEXT01.pdfFile size 2833 kBChecksum SHA-512
911ab5e4aad4ea971e3da8417bd73cbfe0da2ada6e14df1938c94a742a4eb7373f6200c1cd9c00373fa82fe87d32e91f08e8970107032235103a56d5bdb4dc4f
Type fulltextMimetype application/pdf
omslag(3371 kB)20 downloads
File information
File name COVER01.pdfFile size 3371 kBChecksum SHA-512
3e2ba0dc561a8943de17f12cdaeb73771374ff9853517dbd19fdb0d5de4f6efa538c011282d0635af74ecf81e35e490387c8a4828e6377564db2c9cf5a2593f5
Type coverMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Wallenberg, Marcus
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 157 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 1579 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf