liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models
KTH Royal Inst Technol, Sweden.
KTH Royal Inst Technol, Sweden.
KTH Royal Inst Technol, Sweden.
Linköping University, Faculty of Arts and Sciences. Linköping University, Department of Computer and Information Science, Human-Centered Systems.ORCID iD: 0000-0002-7556-5079
Show others and affiliations
2024 (English)In: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2024, ASSOC COMPUTING MACHINERY , 2024, p. 469-478Conference paper, Published paper (Refereed)
Abstract [en]

Enjoyment is a crucial yet complex indicator of positive user experience in Human-Robot Interaction (HRI). While manual enjoyment annotation is feasible, developing reliable automatic detection methods remains a challenge. This paper investigates a multimodal approach to automatic enjoyment annotation for HRI conversations, leveraging large language models (LLMs), visual, audio, and temporal cues. Our findings demonstrate that both text-only and multimodal LLMs with carefully designed prompts can achieve performance comparable to human annotators in detecting user enjoyment. Furthermore, results reveal a stronger alignment between LLM-based annotations and user self-reports of enjoyment compared to human annotators. While multimodal supervised learning techniques did not improve all of our performance metrics, they could successfully replicate human annotators and highlighted the importance of visual and audio cues in detecting subtle shifts in enjoyment. This research demonstrates the potential of LLMs for real-time enjoyment detection, paving the way for adaptive companion robots that can dynamically enhance user experiences.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY , 2024. p. 469-478
Keywords [en]
User Enjoyment; Affect Recognition; Human-Robot Interaction; Large Language Models; Multimodal; Older Adults
National Category
Robotics and automation
Identifiers
URN: urn:nbn:se:liu:diva-212858DOI: 10.1145/3678957.3685729ISI: 001433669800051Scopus ID: 2-s2.0-85212589337ISBN: 9798400704628 (print)OAI: oai:DiVA.org:liu-212858DiVA, id: diva2:1950686
Conference
Companion International Conference on Multimodal Interaction, San Jose, COSTA RICA, nov 04-08, 2024
Note

Funding Agencies|KTH Digital Futures (Sweden); Swedish Research Council [2021-05803]

Available from: 2025-04-08 Created: 2025-04-08 Last updated: 2026-03-04

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Thunberg, Sofia

Search in DiVA

By author/editor
Thunberg, Sofia
By organisation
Faculty of Arts and SciencesHuman-Centered Systems
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf