liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Mohamed bin Zayed Univ AI, U Arab Emirates.
Mohamed bin Zayed Univ AI, U Arab Emirates.
Mohamed bin Zayed Univ AI, U Arab Emirates; Aalto Univ, Finland.
Mohamed bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
Show others and affiliations
2024 (English)In: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2024, p. 26886-26896Conference paper, Published paper (Refereed)
Abstract [en]

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases. Existing works predominantly rely on visual queries combined with modification text to distinguish relevant videos. However, such a strategy struggles to fully preserve the rich query-specific context in retrieved target videos and only represents the target video using visual embedding. We introduce a novel CoVR framework that leverages detailed language descriptions to explicitly encode query-specific contextual information and learns discriminative embeddings of vision only, text only and vision-text for better alignment to accurately retrieve matched target videos. Our proposed framework can be flexibly employed for both composed video (CoVR) and image (CoIR) retrieval tasks. Experiments on three datasets show that our approach obtains state-of-the-art performance for both CovR and zero-shot CoIR tasks, achieving gains as high as around 7% in terms of recall@K=1 score. Our code, detailed language descriptions for WebViD-CoVR dataset are available at https://github.com/OmkarThawakar/composed-video-retrieval.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC , 2024. p. 26886-26896
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-212441DOI: 10.1109/CVPR52733.2024.02540ISI: 001344387503026Scopus ID: 2-s2.0-85200815267ISBN: 9798350353006 (electronic)ISBN: 9798350353013 (print)OAI: oai:DiVA.org:liu-212441DiVA, id: diva2:1945919
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, jun 16-22, 2024
Note

Funding Agencies|Swedish Research Council [2022-06725, 2022-04266]; Knut and Alice Wallenberg Foundation at the NSC; Swedish Research Council [2022-04266] Funding Source: Swedish Research Council

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-03-19

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Felsberg, MichaelKhan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf