liu.seSearch for publications in DiVA
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Where Does Linguistic Information Emerge in Neural Language Models?: Measuring Gains and Contributions across Layers
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. (Natural Language Processing Group)
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. (Natural Language Processing Group)ORCID-id: 0000-0002-2492-9872
2022 (engelsk)Inngår i: Proceedings of the 29th International Conference on Computational Linguistics / [ed] Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na, 2022, s. 4664-4676, artikkel-id 1.413Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Probing studies have extensively explored where in neural language models linguistic information is located. The standard approach to interpreting the results of a probing classifier is to focus on the layers whose representations give the highest performance on the probing task. We propose an alternative method that asks where the task-relevant information emerges in the model. Our framework consists of a family of metrics that explicitly model local information gain relative to the previous layer and each layer’s contribution to the model’s overall performance. We apply the new metrics to two pairs of syntactic probing tasks with different degrees of complexity and find that the metrics confirm the expected ordering only for one of the pairs. Our local metrics show a massive dominance of the first layers, indicating that the features that contribute the most to our probing tasks are not as high-level as global metrics suggest.

sted, utgiver, år, opplag, sider
2022. s. 4664-4676, artikkel-id 1.413
Emneord [en]
NLP, AI, Language Technology, Computational Linguistics, Machine Learning
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-191000OAI: oai:DiVA.org:liu-191000DiVA, id: diva2:1725881
Konferanse
COLING, October 12–17, 2022
Tilgjengelig fra: 2023-01-12 Laget: 2023-01-12 Sist oppdatert: 2024-05-23bibliografisk kontrollert
Inngår i avhandling
1. Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
Åpne denne publikasjonen i ny fane eller vindu >>Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Large language models (LLMs) have become the base of many natural language processing (NLP) systems due to their performance and easy adaptability to various tasks. However, much about their inner workings is still unknown. LLMs have many millions or billions of parameters, and large parts of their training happen in a self-supervised fashion: They simply learn to predict the next word, or missing words, in a sequence. This is effective for picking up a wide range of linguistic, factual and relational information, but it implies that it is not trivial what exactly is learned, and how it is represented within the LLM. 

In this thesis, I present our work on methods contributing to better understanding LLMs. The work can be grouped into two approaches. The first lies within the field of interpretability, which is concerned with understanding the internal workings of the LLMs. Specifically, we analyse and refine a tool called probing classifiers that inspects the intermediate representations of LLMs, focusing on what roles the various layers of the neural model play. This helps us to get a global understanding of how information is structured in the model. I present our work on assessing and improving the probing methodologies. We developed a framework to clarify the limitations of past methods, showing that all common controls are insufficient. Based on this, we proposed more restrictive probing setups by creating artificial distribution shifts. We developed new metrics for the evaluation of probing classifiers that move the focus from the overall information that the layer contains to differences in information content across the LLM. 

The second approach is concerned with explainability, specifically with self-rationalising models that generate free-text explanations along with their predictions. This is an instance of local understandability: We obtain justifications for individual predictions. In this setup, however, the generation of the explanations is just as opaque as the generation of the predictions. Therefore, our work in this field focuses on better understanding the properties of the generated explanations. We evaluate the downstream performance of a classifier with explanations generated by different model pipelines and compare it to human ratings of the explanations. Our results indicate that the properties that increase the downstream performance differ from those that humans appreciate when evaluating an explanation. Finally, we annotate explanations generated by an LLM for properties that human explanations typically have and discuss the effects those properties have on different user groups. 

While a detailed understanding of the inner workings of LLMs is still unfeasible, I argue that the techniques and analyses presented in this work can help to better understand LLMs, the linguistic knowledge they encode and their decision-making process. Together with knowledge about the models’ architecture, training data and training objective, such techniques can help us develop a robust high-level understanding of LLMs that can guide decisions on their deployment and potential improvements. 

sted, utgiver, år, opplag, sider
Linköping: Linköping University Electronic Press, 2024. s. 81
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2364
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-201985 (URN)10.3384/9789180754712 (DOI)9789180754705 (ISBN)9789180754712 (ISBN)
Disputas
2024-04-18, Ada Lovelace, B-building, Campus Valla, Linköping, 14:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2024-04-02 Laget: 2024-04-02 Sist oppdatert: 2024-04-02bibliografisk kontrollert

Open Access i DiVA

fulltext(962 kB)5 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 962 kBChecksum SHA-512
dacf2c5900725b2c51095892c471359265c593be7f251d1016d0b1dc29e2d8164c3503591a7bca43c0df522ee98e6f877223f5a6bfb553ce34ca9ffb2b81b2a4
Type fulltextMimetype application/pdf

Andre lenker

Förlagets fulltext / Publisher's full text

Person

Kunz, JennyKuhlmann, Marco

Søk i DiVA

Av forfatter/redaktør
Kunz, JennyKuhlmann, Marco
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 6 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 167 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf