Recognizing Actions Through Action-Specific Person Detection
2015 (English)In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 24, no 11, 4422-4432 p.Article in journal (Refereed) Published
Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.
Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC , 2015. Vol. 24, no 11, 4422-4432 p.
Action recognition; transfer learning; deep features
Electrical Engineering, Electronic Engineering, Information Engineering
IdentifiersURN: urn:nbn:se:liu:diva-121419DOI: 10.1109/TIP.2015.2465147ISI: 000360408800004PubMedID: 26259079OAI: oai:DiVA.org:liu-121419DiVA: diva2:855148
Funding Agencies|Svalbard Science Forum through the Collaborative Unmanned Aircraft Systems Project; VR through the ETT Project; Strategic Area for ICT Research ELLIIT; CADICS; Academy of Finland [255745, 251170]; Data to Intelligence DIGILE SHOK Project [TIN2013-41751, TIN2014-52072-P]; Spanish Morocco Economic Competitiveness Project [TRA2014-57088-C2-1-R]; Spanish Ministry of Science through the Spanish DGT Project [SPIP2014-01352]; Generalitat de Catalunya Project [2014-SGR-1506, 2014-SGR-221]; MICINN through Ramon y Cajal Fellowship; Chinese Scholarship Council 2015-09-182015-09-182015-09-18