liu.seSearch for publications in DiVA
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Mohamed Bin Zayed Univ AI, U Arab Emirates.
Mohamed Bin Zayed Univ AI, U Arab Emirates.
Mohamed Bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed Bin Zayed Univ AI, U Arab Emirates.
2024 (English)In: PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2024, p. 12585-12602Conference paper, Published paper (Refereed)
Abstract [en]

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data. While there have been initial attempts for image-based conversation models, this work addresses the under-explored field of video-based conversation by introducing Video-ChatGPT. It is a multimodal model that merges a video-adapted visual encoder with an LLM. The resulting model is capable of understanding and generating detailed conversations about videos. We introduce a new dataset of 100,000 video-instruction pairs used to train Video-ChatGPT acquired via manual and semi-automated pipeline that is easily scalable and robust to label noise. We also develop a quantitative evaluation framework for videobased dialogue models to objectively analyze the strengths and weaknesses of video-based dialogue models. Code: https://github.com/ mbzuai-oryx/Video- ChatGPT.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2024. p. 12585-12602
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:liu:diva-212058ISI: 001391776303041ISBN: 9798891760943 (print)OAI: oai:DiVA.org:liu-212058DiVA, id: diva2:1942649
Conference
62nd Annual Meeting of the Association-for-Computational-Linguistics (ACL) / Student Research Workshop (SRW), Bangkok, THAILAND, aug 11-16, 2024
Available from: 2025-03-06 Created: 2025-03-06 Last updated: 2025-03-06

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Khan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 88 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf