liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Recognizing Actions Through Action-Specific Person Detection
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
Comp Vis Centre Barcelona, Spain.
Comp Vis Centre Barcelona, Spain.
Comp Vis Centre Barcelona, Spain.
Show others and affiliations
2015 (English)In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 24, no 11, 4422-4432 p.Article in journal (Refereed) Published
Abstract [en]

Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

Place, publisher, year, edition, pages
Keyword [en]
Action recognition; transfer learning; deep features
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
URN: urn:nbn:se:liu:diva-121419DOI: 10.1109/TIP.2015.2465147ISI: 000360408800004PubMedID: 26259079OAI: diva2:855148

Funding Agencies|Svalbard Science Forum through the Collaborative Unmanned Aircraft Systems Project; VR through the ETT Project; Strategic Area for ICT Research ELLIIT; CADICS; Academy of Finland [255745, 251170]; Data to Intelligence DIGILE SHOK Project [TIN2013-41751, TIN2014-52072-P]; Spanish Morocco Economic Competitiveness Project [TRA2014-57088-C2-1-R]; Spanish Ministry of Science through the Spanish DGT Project [SPIP2014-01352]; Generalitat de Catalunya Project [2014-SGR-1506, 2014-SGR-221]; MICINN through Ramon y Cajal Fellowship; Chinese Scholarship Council [2011611023]

Available from: 2015-09-18 Created: 2015-09-18 Last updated: 2015-09-18

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Khan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
In the same journal
IEEE Transactions on Image Processing
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 350 hits
ReferencesLink to record
Permanent link

Direct link