liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
Incept Inst Artificial Intelligence, U Arab Emirates.
Mohamed Bin Zayed Univ AI, U Arab Emirates.
Monash Univ, Australia.
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed Bin Zayed Univ AI, U Arab Emirates.
Visa övriga samt affilieringar
2021 (Engelska)Ingår i: 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), IEEE , 2021, s. 13588-13597Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on multiple benchmarks, including THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on all datasets, achieving gains as high as 2.3% in terms of mAP at IoU=0.5 on THUMOS14. Source code is available at https://github.com/naraysa/D2-Net.

Ort, förlag, år, upplaga, sidor
IEEE , 2021. s. 13588-13597
Nationell ämneskategori
Signalbehandling
Identifikatorer
URN: urn:nbn:se:liu:diva-186523DOI: 10.1109/ICCV48922.2021.01335ISI: 000798743203076ISBN: 9781665428125 (digital)OAI: oai:DiVA.org:liu-186523DiVA, id: diva2:1679001
Konferens
18th IEEE/CVF International Conference on Computer Vision (ICCV), ELECTR NETWORK, oct 11-17, 2021
Anmärkning

Funding Agencies|ARC DECRA Fellowship [DE200101100]; NSF CAREER [1149783]; VR [2016-05543]

Tillgänglig från: 2022-06-30 Skapad: 2022-06-30 Senast uppdaterad: 2022-06-30

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Sök vidare i DiVA

Av författaren/redaktören
Khan, Fahad
Av organisationen
DatorseendeTekniska fakulteten
Signalbehandling

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 50 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf