liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-stream Convolutional Networks for Indoor Scene Recognition
Aalto Univ, Finland; Incept Inst Artificial Intelligence, U Arab Emirates.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Incept Inst Artificial Intelligence, U Arab Emirates.
Aalto Univ, Finland.
United Arab Emirates Univ, U Arab Emirates.
2019 (English)In: COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT I, SPRINGER INTERNATIONAL PUBLISHING AG , 2019, Vol. 11678, p. 196-208Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional neural networks (CNNs) have recently achieved outstanding results for various vision tasks, including indoor scene understanding. The de facto practice employed by state-of-the-art indoor scene recognition approaches is to use RGB pixel values as input to CNN models that are trained on large amounts of labeled data (Image-Net or Places). Here, we investigate CNN architectures by augmenting RGB images with estimated depth and texture information, as multiple streams, for monocular indoor scene recognition. First, we exploit the recent advancements in the field of depth estimation from monocular images and use the estimated depth information to train a CNN model for learning deep depth features. Second, we train a CNN model to exploit the successful Local Binary Patterns (LBP) by using mapped coded images with explicit LBP encoding to capture texture information available in indoor scenes. We further investigate different fusion strategies to combine the learned deep depth and texture streams with the traditional RGB stream. Comprehensive experiments are performed on three indoor scene classification benchmarks: MIT-67, OCIS and SUN-397. The proposed multi-stream network significantly outperforms the standard RGB network by achieving an absolute gain of 9.3%, 4.7%, 7.3% on the MIT-67, OCIS and SUN-397 datasets respectively.

Place, publisher, year, edition, pages
SPRINGER INTERNATIONAL PUBLISHING AG , 2019. Vol. 11678, p. 196-208
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords [en]
Scene recognition; Depth features; Texture features
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-168856DOI: 10.1007/978-3-030-29888-3_16ISI: 000558153800016ISBN: 978-3-030-29888-3 (electronic)ISBN: 978-3-030-29887-6 (print)OAI: oai:DiVA.org:liu-168856DiVA, id: diva2:1466223
Conference
18th international Conference on Computer Analysis of Images and Patterns (CAIP)
Note

Funding Agencies|Academy of FinlandAcademy of Finland [313988]; European UnionEuropean Union (EU) [780069]

Available from: 2020-09-11 Created: 2020-09-11 Last updated: 2025-02-07

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Khan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 292 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf