liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Deep Semantic Pyramids for Human Attributes and Action Recognition
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
Department of Information and Computer Science, Aalto University School of Science, Aalto, Finland.
Computer Vision Center, CS Department, Universitet Autonoma de Barcelona, Barcelona, Spain.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6096-3648
Show others and affiliations
2015 (English)In: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer, 2015, Vol. 9127, 341-353 p.Conference paper (Refereed)
Abstract [en]

Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features.

We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of 17.2%, 13.9%, 24.3% and 22.6% compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.

Place, publisher, year, edition, pages
Springer, 2015. Vol. 9127, 341-353 p.
Lecture Notes in Computer Science, ISSN 0302-9743 (print), 1611-3349 (online) ; 9127
Keyword [en]
Action recognition Human attributes Semantic pyramids
National Category
Robotics Computer Systems
URN: urn:nbn:se:liu:diva-121606DOI: 10.1007/978-3-319-19665-7ISBN: 978-3-319-19665-7 (E-book)ISBN: 978-3-319-19664-0 (Print)OAI: diva2:857230
19th Scandinavian Conference on Image Analysis (SCIA), Copenhagen, Denmark, June 15-17, 2015
Available from: 2015-09-28 Created: 2015-09-28 Last updated: 2016-05-04Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Khan, Fahad ShahbazFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
RoboticsComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 228 hits
ReferencesLink to record
Permanent link

Direct link