liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A generative appearance model for end-to-end video object segmentation
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenuity, Sweden.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. ETH Zurich, Switzerland.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Saab, Sweden.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. IIAI, UAE.
Show others and affiliations
2019 (English)In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8945-8954Conference paper, Published paper (Refereed)
Abstract [en]

One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019. p. 8945-8954
Series
Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Keywords [en]
Segmentation; Grouping and Shape; Motion and Tracking
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-161037DOI: 10.1109/CVPR.2019.00916ISBN: 9781728132938 (electronic)ISBN: 9781728132945 (print)OAI: oai:DiVA.org:liu-161037DiVA, id: diva2:1361997
Conference
IEEE Conference on Computer Vision and Pattern Recognition. 2019, Long Beach, CA, USA, USA, 15-20 June 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research Swedish Research CouncilAvailable from: 2019-10-17 Created: 2019-10-17 Last updated: 2020-01-22Bibliographically approved

Open Access in DiVA

fulltext(2100 kB)11 downloads
File information
File name FULLTEXT01.pdfFile size 2100 kBChecksum SHA-512
bfa5f781e534e3a6a9938d945ac3083e30924678f93fd063f0ff7092f8e396ec3f559845c2663df3d0a723013b3e80e391ef7706b58257ecd09276b4d749b45a
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Johnander, JoakimDanelljan, MartinBrissman, EmilKhan, Fahad ShahbazFelsberg, Michael

Search in DiVA

By author/editor
Johnander, JoakimDanelljan, MartinBrissman, EmilKhan, Fahad ShahbazFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 11 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 52 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf