liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Holmström, Oskar
    et al.
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Doostmohammadi, Ehsan
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Making Instruction Finetuning Accessible to Non-English Languages: A Case Study on Swedish Models2023In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 634-642Conference paper (Refereed)
    Abstract [en]

    In recent years, instruction finetuning models have received increased attention due to their remarkable zero-shot and generalization capabilities. However, the widespread implementation of these models has been limited to the English language, largely due to the costs and challenges associated with creating instruction datasets. To overcome this, automatic instruction generation has been proposed as a resourceful alternative. We see this as an opportunity for the adoption of instruction finetuning for other languages. In this paper we explore the viability of instruction finetuning for Swedish. We translate a dataset of generated instructions from English to Swedish, using it to finetune both Swedish and non-Swedish models. Results indicate that the use of translated instructions significantly improves the models’ zero-shot performance, even on unseen data, while staying competitive with strong baselines ten times in size. We see this paper is a first step and a proof of concept that instruction finetuning for Swedish is within reach, through resourceful means, and that there exist several directions for further improvements.

  • 2.
    Holmström, Oskar
    et al.
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Kunz, Jenny
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Kuhlmann, Marco
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems.
    Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish2023In: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), Tórshavn, the Faroe Islands, 2023, p. 92-110Conference paper (Refereed)
    Abstract [en]

    Large language models (LLMs) have substantially improved natural language processing (NLP) performance, but training these models from scratch is resource-intensive and challenging for smaller languages. With this paper, we want to initiate a discussion on the necessity of language-specific pre-training of LLMs. We propose how the “one model-many models” conceptual framework for task transfer can be applied to language transfer and explore this approach by evaluating the performance of non-Swedish monolingual and multilingual models’ performance on tasks in Swedish. Our findings demonstrate that LLMs exposed to limited Swedish during training can be highly capable and transfer competencies from English off-the-shelf, including emergent abilities such as mathematical reasoning, while at the same time showing distinct culturally adapted behaviour. Our results suggest that there are resourceful alternatives to language-specific pre-training when creating useful LLMs for small languages.

  • 3.
    Kunz, Jenny
    et al.
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Holmström, Oskar
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    The Impact of Language Adapters in Cross-Lingual Transfer for NLU2024Conference paper (Refereed)
    Abstract [en]

    Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.

  • 4.
    Kunz, Jenny
    et al.
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Jirénius, Martin
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Holmström, Oskar
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Kuhlmann, Marco
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions2022In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022, Vol. 5, p. 164-177, article id 2022.blackboxnlp-1.14Conference paper (Refereed)
    Abstract [en]

    Models able to generate free-text rationales that explain their output have been proposed as an important step towards interpretable NLP for “reasoning” tasks such as natural language inference and commonsense question answering. However, the relative merits of different architectures and types of rationales are not well understood and hard to measure. In this paper, we contribute two insights to this line of research: First, we find that models trained on gold explanations learn to rely on these but, in the case of the more challenging question answering data set we use, fail when given generated explanations at test time. However, additional fine-tuning on generated explanations teaches the model to distinguish between reliable and unreliable information in explanations. Second, we compare explanations by a generation-only model to those generated by a self-rationalizing model and find that, while the former score higher in terms of validity, factual correctness, and similarity to gold explanations, they are not more useful for downstream classification. We observe that the self-rationalizing model is prone to hallucination, which is punished by most metrics but may add useful context for the classification step.

1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf