liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Classifier Probes May Just Learn from Linear Context Features
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering. (Natural Language Processing Group)
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. (Natural Language Processing Group)ORCID iD: 0000-0002-2492-9872
2020 (English)In: Proceedings of the 28th International Conference on Computational Linguistics, 2020, Vol. 28, p. 5136-5146, article id 450Conference paper, Published paper (Refereed)
Abstract [en]

Classifiers trained on auxiliary probing tasks are a popular tool to analyze the representations learned by neural sentence encoders such as BERT and ELMo. While many authors are aware of the difficulty to distinguish between “extracting the linguistic structure encoded in the representations” and “learning the probing task,” the validity of probing methods calls for further research. Using a neighboring word identity prediction task, we show that the token embeddings learned by neural sentence encoders contain a significant amount of information about the exact linear context of the token, and hypothesize that, with such information, learning standard probing tasks may be feasible even without additional linguistic structure. We develop this hypothesis into a framework in which analysis efforts can be scrutinized and argue that, with current models and baselines, conclusions that representations contain linguistic structure are not well-founded. Current probing methodology, such as restricting the classifier’s expressiveness or using strong baselines, can help to better estimate the complexity of learning, but not build a foundation for speculations about the nature of the linguistic structure encoded in the learned representations.

Place, publisher, year, edition, pages
2020. Vol. 28, p. 5136-5146, article id 450
Keywords [en]
Natural Language Processing, Machine Learning, Neural Language Representations
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-175384DOI: 10.18653/v1/2020.coling-main.450OAI: oai:DiVA.org:liu-175384DiVA, id: diva2:1548430
Conference
International Conference on Computational Linguistics (COLING), Barcelona, Spain (Online), December 8–13, 2020
Available from: 2021-04-30 Created: 2021-04-30 Last updated: 2024-04-02Bibliographically approved
In thesis
1. Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
Open this publication in new window or tab >>Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Large language models (LLMs) have become the base of many natural language processing (NLP) systems due to their performance and easy adaptability to various tasks. However, much about their inner workings is still unknown. LLMs have many millions or billions of parameters, and large parts of their training happen in a self-supervised fashion: They simply learn to predict the next word, or missing words, in a sequence. This is effective for picking up a wide range of linguistic, factual and relational information, but it implies that it is not trivial what exactly is learned, and how it is represented within the LLM. 

In this thesis, I present our work on methods contributing to better understanding LLMs. The work can be grouped into two approaches. The first lies within the field of interpretability, which is concerned with understanding the internal workings of the LLMs. Specifically, we analyse and refine a tool called probing classifiers that inspects the intermediate representations of LLMs, focusing on what roles the various layers of the neural model play. This helps us to get a global understanding of how information is structured in the model. I present our work on assessing and improving the probing methodologies. We developed a framework to clarify the limitations of past methods, showing that all common controls are insufficient. Based on this, we proposed more restrictive probing setups by creating artificial distribution shifts. We developed new metrics for the evaluation of probing classifiers that move the focus from the overall information that the layer contains to differences in information content across the LLM. 

The second approach is concerned with explainability, specifically with self-rationalising models that generate free-text explanations along with their predictions. This is an instance of local understandability: We obtain justifications for individual predictions. In this setup, however, the generation of the explanations is just as opaque as the generation of the predictions. Therefore, our work in this field focuses on better understanding the properties of the generated explanations. We evaluate the downstream performance of a classifier with explanations generated by different model pipelines and compare it to human ratings of the explanations. Our results indicate that the properties that increase the downstream performance differ from those that humans appreciate when evaluating an explanation. Finally, we annotate explanations generated by an LLM for properties that human explanations typically have and discuss the effects those properties have on different user groups. 

While a detailed understanding of the inner workings of LLMs is still unfeasible, I argue that the techniques and analyses presented in this work can help to better understand LLMs, the linguistic knowledge they encode and their decision-making process. Together with knowledge about the models’ architecture, training data and training objective, such techniques can help us develop a robust high-level understanding of LLMs that can guide decisions on their deployment and potential improvements. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2024. p. 81
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2364
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-201985 (URN)10.3384/9789180754712 (DOI)9789180754705 (ISBN)9789180754712 (ISBN)
Public defence
2024-04-18, Ada Lovelace, B-building, Campus Valla, Linköping, 14:00 (English)
Opponent
Supervisors
Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-02Bibliographically approved

Open Access in DiVA

fulltext(223 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 223 kBChecksum SHA-512
5d5bb1e6ea9778846e3497bdbba4191db8fc929308abd8305595d6c43532148b8474bccdbab820375db77c49f39108dea37e5675c41b3ade2bd536fa39e2c56c
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Kuhlmann, Marco

Search in DiVA

By author/editor
Kunz, JennyKuhlmann, Marco
By organisation
Department of Computer and Information ScienceFaculty of Science & EngineeringHuman-Centered systemsArtificial Intelligence and Integrated Computer Systems
Language Technology (Computational Linguistics)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 133 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf