liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
AudioViewer: Learning to Visualize Sounds
University of British Columbia.
University of British Columbia.
University of British Columbia.
University of British Columbia.
Show others and affiliations
2023 (English)In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 2205-2215Conference paper, Published paper (Refereed)
Abstract [en]

A long-standing goal in the field of sensory substitution is enabling sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, with-out labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and numbers are well suited for visualizing high-dimensional audio features that can be parsed by humans to match and distinguish between sounds and words. Project website: https://chunjinsong.github.io/audioviewer

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023. p. 2205-2215
Series
IEEE Workshop on Applications of Computer Vision (WACV), ISSN 2472-6737, E-ISSN 2642-9381
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:liu:diva-209826DOI: 10.1109/wacv56688.2023.00224ISBN: 9781665493468 (electronic)ISBN: 9781665493475 (print)OAI: oai:DiVA.org:liu-209826DiVA, id: diva2:1913277
Conference
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 02-07 January, 2023
Available from: 2024-11-14 Created: 2024-11-14 Last updated: 2025-02-07

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Wandt, Bastian

Search in DiVA

By author/editor
Wandt, Bastian
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf