liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering. (NLP)ORCID iD: 0009-0006-1001-0546
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering. (NLP)
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering. (NLP)ORCID iD: 0000-0002-2492-9872
2022 (English)In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022, Vol. 5, p. 164-177, article id 2022.blackboxnlp-1.14Conference paper, Published paper (Refereed)
Abstract [en]

Models able to generate free-text rationales that explain their output have been proposed as an important step towards interpretable NLP for “reasoning” tasks such as natural language inference and commonsense question answering. However, the relative merits of different architectures and types of rationales are not well understood and hard to measure. In this paper, we contribute two insights to this line of research: First, we find that models trained on gold explanations learn to rely on these but, in the case of the more challenging question answering data set we use, fail when given generated explanations at test time. However, additional fine-tuning on generated explanations teaches the model to distinguish between reliable and unreliable information in explanations. Second, we compare explanations by a generation-only model to those generated by a self-rationalizing model and find that, while the former score higher in terms of validity, factual correctness, and similarity to gold explanations, they are not more useful for downstream classification. We observe that the self-rationalizing model is prone to hallucination, which is punished by most metrics but may add useful context for the classification step.

Place, publisher, year, edition, pages
2022. Vol. 5, p. 164-177, article id 2022.blackboxnlp-1.14
Keywords [en]
Large Language Models, Neural Networks, Transformers, Interpretability, Explainability
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-195615OAI: oai:DiVA.org:liu-195615DiVA, id: diva2:1773126
Conference
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, December 8, 2022
Available from: 2023-06-22 Created: 2023-06-22 Last updated: 2023-06-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

https://aclanthology.org/2022.blackboxnlp-1.14

Authority records

Kunz, JennyHolmström, OskarKuhlmann, Marco

Search in DiVA

By author/editor
Kunz, JennyJirénius, MartinHolmström, OskarKuhlmann, Marco
By organisation
Artificial Intelligence and Integrated Computer SystemsFaculty of Science & EngineeringDepartment of Computer and Information Science
Language Technology (Computational Linguistics)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 52 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf