liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Huazhong Univ Sci & Technol HUST, Peoples R China; Alibaba Grp, Peoples R China.
Mohamed bin Zayed Univ AI, U Arab Emirates.
Huazhong Univ Sci & Technol HUST, Peoples R China.
Univ Sydney, Australia.
Show others and affiliations
2024 (English)In: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2024, p. 23627-23637Conference paper, Published paper (Refereed)
Abstract [en]

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor generalizations (e.g., overfitting to seen classes). To address this issue, we propose a novel Visual-Augmented Dynamic Semantic prototype method (termed VADS) to boost the generator to learn accurate semantic-visual mapping by fully exploiting the visual-augmented knowledge into semantic conditions. In detail, VADS consists of two modules: (1) Visual-aware Domain Knowledge Learning module (VDKL) learns the local bias and global prior of the visual features (referred to as domain visual knowledge), which replace pure Gaussian noise to provide richer prior noise information; (2) Vision-Oriented Semantic Updation module (VOSU) updates the semantic prototype according to the visual representations of the samples. Ultimately, we concatenate their output as a dynamic semantic prototype, which serves as the condition of the generator. Extensive experiments demonstrate that our VADS achieves superior CZSL and GZSL performances on three prominent datasets and outperforms other state-of-the-art methods with averaging increases by 6.4%, 5.9% and 4.2% on SUN, CUB and AWA2, respectively.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC , 2024. p. 23627-23637
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-211621DOI: 10.1109/CVPR52733.2024.02230ISI: 001342515507002Scopus ID: 2-s2.0-85207272708ISBN: 9798350353006 (electronic)ISBN: 9798350353013 (print)OAI: oai:DiVA.org:liu-211621DiVA, id: diva2:1937058
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, jun 16-22, 2024
Note

Funding Agencies|National Key R&D Program of China [2022YFC3301000]

Available from: 2025-02-12 Created: 2025-02-12 Last updated: 2025-02-12

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Khan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 44 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf