liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
On the Generalization Ability of Retrieval-Enhanced Transformers
Chalmers University of Technology, Sweden; Recorded Future.
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-5633-5307
Chalmers University of Technology, Sweden; University of Gothenburg, Sweden.
Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-2492-9872
2023 (Engelska)Ingår i: Findings of the Association for Computational Linguistics: EACL 2023 / [ed] Andreas Vlachos, Isabelle Augenstein, Association for Computational Linguistics , 2023, s. 1485-1493Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent work on the Retrieval-Enhanced Transformer (Retro) model has shown that offloading memory from trainable weights to a retrieval database can significantly improve language modeling and match the performance of non-retrieval models that are an order of magnitude larger in size. It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval. In this paper, we try to better understand the relative contributions of these two components. We find that the performance gains from retrieval largely originate from over-lapping tokens between the database and the test data, suggesting less non-trivial generalization than previously assumed. More generally, our results point to the challenges of evaluating the generalization of retrieval-augmented language models such as Retro, as even limited token overlap may significantly decrease test-time loss. We release our code and model at https://github.com/TobiasNorlund/retro

Ort, förlag, år, upplaga, sidor
Association for Computational Linguistics , 2023. s. 1485-1493
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:liu:diva-195609DOI: 10.18653/v1/2023.findings-eacl.109ISI: 001181085100107Scopus ID: 2-s2.0-85159856506ISBN: 9781959429470 (tryckt)OAI: oai:DiVA.org:liu-195609DiVA, id: diva2:1773065
Konferens
EACL 2023, May 2-6, 2023, Dubrovnik, Croatia
Anmärkning

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]

Tillgänglig från: 2023-06-22 Skapad: 2023-06-22 Senast uppdaterad: 2026-02-20Bibliografiskt granskad
Ingår i avhandling
1. Toward Understanding and Enhancing the Training and Evaluation of Language Models: A Study on Vision, Instruction Tuning, and Retrieval Augmentation
Öppna denna publikation i ny flik eller fönster >>Toward Understanding and Enhancing the Training and Evaluation of Language Models: A Study on Vision, Instruction Tuning, and Retrieval Augmentation
2026 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This dissertation advances two complementary aims in the study of large language models: (i) understanding their inner workings and (ii) improving their training and evaluation. It does so through three lines of inquiry: integrating visual signals into language modeling, instruction tuning for English and a low-resource language (Swedish), and retrieval augmentation.

First, to study multimodal grounding, pretrained masked language models are exposed to tokenized video alongside aligned text, enabling analysis of how visual context influences next token prediction. Using the psycholinguistically motivated notion of imageability as an interpretable probe, the work shows that video grounding strengthens representations for concrete, highly imageable words, with the effect most consistent in a smaller model. For less imageable words, gains are mixed, and larger models exhibit increased reliance on visual context. These findings indicate that visual grounding benefits are not uniform; they depend on lexical properties and model capacity, and imageability offers a principled lens on what video–language models internalize.

Second, the thesis develops a practical path for instruction tuning in Swedish by translating existing English instruction corpora and finetuning models of varying size and pretraining exposure. Substantial zero-shot gains demonstrate that translated synthetic instructions can substitute for costly native resources. Complementing this, the work assesses automatic evaluation for instruction-following systems using Pairwise Accuracy as a meta-evaluation criterion. It finds that reliability is task- and length-dependent: ROUGE-L is a competitive, low-cost proxy for short, format-constrained outputs; BERTScore is comparatively stronger for longer, free-form answers; and LLM-as-a-judge aligns well with human judgments primarily when provided with reference answers. Cross-lingual analyses highlight that Swedish outputs exacerbate surfacematching weaknesses and no-reference biases, refining guidance on when human assessment remains necessary.

Third, the dissertation analyzes retrieval augmentation through a RETRO-style model. It shows that perplexity reductions concentrate on tokens with lexical overlap between inputs and retrieved neighbors,revealing a dominant surface-level “copy mode.” Leveraging this, surface-focused retrieval (e.g., BM25) is used to replace the dense retrieval mechanism during inference, which reduces perplexity further within this architecture, while lightweight hybrids (semantic pre-filtering with BM25 re-ranking) recover additional gains at minimal cost. The findings also demonstrate that during pretraining, performance improves sharply once input–neighbor overlap crosses a threshold; deliberately increasing overlap with targeted paraphrases can cut training time by about 40% without degrading downstream short-answer QA, though with a modest increase in eventual perplexity.

Overall, the thesis clarifies what signals large language models actually exploit and provides actionable recommendations for data curation, model selection, metric choice, and training strategies.

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2026. s. 173
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2502
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:liu:diva-221398 (URN)10.3384/9789181184440 (DOI)9789181184433 (ISBN)9789181184440 (ISBN)
Disputation
2026-03-27, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2026-02-20 Skapad: 2026-02-20 Senast uppdaterad: 2026-03-05

Open Access i DiVA

fulltext(554 kB)24 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 554 kBChecksumma SHA-512
a0bd4da5f9c9f57e7c8786a3a7b2b0037dae3333c205f1a2e9f2022d41f18fa8b3bb9f409c05354eeceabdac42e90090c568bd3a3a8f6e219b7569f9863d258f
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Doostmohammadi, EhsanKuhlmann, Marco

Sök vidare i DiVA

Av författaren/redaktören
Doostmohammadi, EhsanKuhlmann, Marco
Av organisationen
Artificiell intelligens och integrerade datorsystemTekniska fakulteten
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 24 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 328 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf