liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Northwestern Polytech Univ, Peoples R China.
Mohamed bin Zayed Univ Artificial Intelligence, U Arab Emirates.
Natl Univ Singapore, Singapore.
Northwestern Polytech Univ, Peoples R China.
Visa övriga samt affilieringar
2024 (Engelska)Ingår i: 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2024, s. 17169-17180Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VSCode, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode. SOD COD

Ort, förlag, år, upplaga, sidor
IEEE COMPUTER SOC , 2024. s. 17169-17180
Serie
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:liu:diva-211622DOI: 10.1109/CVPR52733.2024.01625ISI: 001342515500017Scopus ID: 2-s2.0-85201758870ISBN: 9798350353006 (digital)ISBN: 9798350353013 (tryckt)OAI: oai:DiVA.org:liu-211622DiVA, id: diva2:1937065
Konferens
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, jun 16-22, 2024
Anmärkning

Funding Agencies|Key R&D Program of Shaanxi Province [2021ZDLGY01-08]; National Natural Science Foundation of China [62136007, U20B2065, 62036005, 62322605]; Key Research and Development Program of Jiangsu Province [BE2021093]; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center Project [21KT008]; MBZUAI-WIS Joint Program for AI Research [P008, P009]

Tillgänglig från: 2025-02-12 Skapad: 2025-02-12 Senast uppdaterad: 2025-02-12

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Khan, Fahad
Av organisationen
DatorseendeTekniska fakulteten
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 38 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf