liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Test Harder Than You Train: Probing with Extrapolation Splits
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. (Natural Language Processing Group)
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. (Natural Language Processing Group)ORCID-id: 0000-0002-2492-9872
2021 (Engelska)Ingår i: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP / [ed] Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad, Punta Cana, Dominican Republic, 2021, Vol. 5, s. 15-25, artikel-id 2Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Previous work on probing word representations for linguistic knowledge has focused on interpolation tasks. In this paper, we instead analyse probes in an extrapolation setting, where the inputs at test time are deliberately chosen to be ‘harder’ than the training examples. We argue that such an analysis can shed further light on the open question whether probes actually decode linguistic knowledge, or merely learn the diagnostic task from shallow features. To quantify the hardness of an example, we consider scoring functions based on linguistic, statistical, and learning-related criteria, all of which are applicable to a broad range of NLP tasks. We discuss the relative merits of these criteria in the context of two syntactic probing tasks, part-of-speech tagging and syntactic dependency labelling. From our theoretical and experimental analysis, we conclude that distance-based and hard statistical criteria show the clearest differences between interpolation and extrapolation settings, while at the same time being transparent, intuitive, and easy to control.

Ort, förlag, år, upplaga, sidor
Punta Cana, Dominican Republic, 2021. Vol. 5, s. 15-25, artikel-id 2
Nyckelord [en]
Natural Language Processing, Neural Language Models, Interpretability, Probing, BERT, Extrapolation
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling) Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:liu:diva-182166DOI: 10.18653/v1/2021.blackboxnlp-1.2OAI: oai:DiVA.org:liu-182166DiVA, id: diva2:1625785
Konferens
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, November 11, 2021
Tillgänglig från: 2022-01-10 Skapad: 2022-01-10 Senast uppdaterad: 2024-04-02Bibliografiskt granskad
Ingår i avhandling
1. Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
Öppna denna publikation i ny flik eller fönster >>Understanding Large Language Models: Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Large language models (LLMs) have become the base of many natural language processing (NLP) systems due to their performance and easy adaptability to various tasks. However, much about their inner workings is still unknown. LLMs have many millions or billions of parameters, and large parts of their training happen in a self-supervised fashion: They simply learn to predict the next word, or missing words, in a sequence. This is effective for picking up a wide range of linguistic, factual and relational information, but it implies that it is not trivial what exactly is learned, and how it is represented within the LLM. 

In this thesis, I present our work on methods contributing to better understanding LLMs. The work can be grouped into two approaches. The first lies within the field of interpretability, which is concerned with understanding the internal workings of the LLMs. Specifically, we analyse and refine a tool called probing classifiers that inspects the intermediate representations of LLMs, focusing on what roles the various layers of the neural model play. This helps us to get a global understanding of how information is structured in the model. I present our work on assessing and improving the probing methodologies. We developed a framework to clarify the limitations of past methods, showing that all common controls are insufficient. Based on this, we proposed more restrictive probing setups by creating artificial distribution shifts. We developed new metrics for the evaluation of probing classifiers that move the focus from the overall information that the layer contains to differences in information content across the LLM. 

The second approach is concerned with explainability, specifically with self-rationalising models that generate free-text explanations along with their predictions. This is an instance of local understandability: We obtain justifications for individual predictions. In this setup, however, the generation of the explanations is just as opaque as the generation of the predictions. Therefore, our work in this field focuses on better understanding the properties of the generated explanations. We evaluate the downstream performance of a classifier with explanations generated by different model pipelines and compare it to human ratings of the explanations. Our results indicate that the properties that increase the downstream performance differ from those that humans appreciate when evaluating an explanation. Finally, we annotate explanations generated by an LLM for properties that human explanations typically have and discuss the effects those properties have on different user groups. 

While a detailed understanding of the inner workings of LLMs is still unfeasible, I argue that the techniques and analyses presented in this work can help to better understand LLMs, the linguistic knowledge they encode and their decision-making process. Together with knowledge about the models’ architecture, training data and training objective, such techniques can help us develop a robust high-level understanding of LLMs that can guide decisions on their deployment and potential improvements. 

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2024. s. 81
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2364
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:liu:diva-201985 (URN)10.3384/9789180754712 (DOI)9789180754705 (ISBN)9789180754712 (ISBN)
Disputation
2024-04-18, Ada Lovelace, B-building, Campus Valla, Linköping, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-04-02 Skapad: 2024-04-02 Senast uppdaterad: 2024-04-02Bibliografiskt granskad

Open Access i DiVA

fulltext(290 kB)89 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 290 kBChecksumma SHA-512
3e7ef46c336041281d811ed00ed2fd8b88f10f1db98fcfa1075f57cb410cbf37d831fc08ffaa6a126430b80097f71e53aa86b8dac11b8b35821a0f061669166e
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Person

Kunz, JennyKuhlmann, Marco

Sök vidare i DiVA

Av författaren/redaktören
Kunz, JennyKuhlmann, Marco
Av organisationen
Artificiell intelligens och integrerade datorsystemTekniska fakulteten
Språkteknologi (språkvetenskaplig databehandling)Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 89 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 172 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf