liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddingsand Denoised Activations
Inception Institute of Artificial Intelligence, UAE.
Mohamed Bin Zayed University of AI, UAE.
Monash University, Australia.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed Bin Zayed University of AI, UAE.
Show others and affiliations
2021 (English)Other (Other academic)
Abstract [en]

This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on multiple benchmarks, including THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on all datasets, achieving gains as high as 2.3% in terms of mAP at IoU=0.5 on THUMOS14

Place, publisher, year, pages
2021.
Series
arXiv.org ; 2012.06440
Identifiers
URN: urn:nbn:se:liu:diva-179907OAI: oai:DiVA.org:liu-179907DiVA, id: diva2:1600816
Note

ICCV 2021

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12

Open Access in DiVA

No full text in DiVA

Other links

Link to full text in Arxiv

Authority records

Khan, Fahad Shahbaz

Search in DiVA

By author/editor
Khan, Fahad Shahbaz
By organisation
Computer VisionFaculty of Science & Engineering

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 35 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf