liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Holmström, Oskar
Publications (4 of 4) Show all publications
Kunz, J. & Holmström, O. (2024). The Impact of Language Adapters in Cross-Lingual Transfer for NLU. In: : . Paper presented at Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024) (pp. 24-43).
Open this publication in new window or tab >>The Impact of Language Adapters in Cross-Lingual Transfer for NLU
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.

Keywords
Large Language Models, LLMs, Adapters, NLP
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-201912 (URN)
Conference
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Available from: 2024-03-26 Created: 2024-03-26 Last updated: 2024-03-26
Holmström, O., Kunz, J. & Kuhlmann, M. (2023). Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish. In: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023): . Paper presented at RESOURCEFUL workshop at NoDaLiDa (pp. 92-110). Tórshavn, the Faroe Islands
Open this publication in new window or tab >>Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish
2023 (English)In: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), Tórshavn, the Faroe Islands, 2023, p. 92-110Conference paper, Published paper (Refereed)
Abstract [en]

Large language models (LLMs) have substantially improved natural language processing (NLP) performance, but training these models from scratch is resource-intensive and challenging for smaller languages. With this paper, we want to initiate a discussion on the necessity of language-specific pre-training of LLMs. We propose how the “one model-many models” conceptual framework for task transfer can be applied to language transfer and explore this approach by evaluating the performance of non-Swedish monolingual and multilingual models’ performance on tasks in Swedish. Our findings demonstrate that LLMs exposed to limited Swedish during training can be highly capable and transfer competencies from English off-the-shelf, including emergent abilities such as mathematical reasoning, while at the same time showing distinct culturally adapted behaviour. Our results suggest that there are resourceful alternatives to language-specific pre-training when creating useful LLMs for small languages.

Place, publisher, year, edition, pages
Tórshavn, the Faroe Islands: , 2023
Keywords
NLP, Natural Language Processing, language model, GPT, monolingual, multilingual, cross-lingual, one model-many models
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-196545 (URN)
Conference
RESOURCEFUL workshop at NoDaLiDa
Funder
CUGS (National Graduate School in Computer Science)
Available from: 2023-08-11 Created: 2023-08-11 Last updated: 2023-08-11
Holmström, O. & Doostmohammadi, E. (2023). Making Instruction Finetuning Accessible to Non-English Languages: A Case Study on Swedish Models. In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa): . Paper presented at NoDaLiDa (pp. 634-642).
Open this publication in new window or tab >>Making Instruction Finetuning Accessible to Non-English Languages: A Case Study on Swedish Models
2023 (English)In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 634-642Conference paper, Published paper (Refereed)
Abstract [en]

In recent years, instruction finetuning models have received increased attention due to their remarkable zero-shot and generalization capabilities. However, the widespread implementation of these models has been limited to the English language, largely due to the costs and challenges associated with creating instruction datasets. To overcome this, automatic instruction generation has been proposed as a resourceful alternative. We see this as an opportunity for the adoption of instruction finetuning for other languages. In this paper we explore the viability of instruction finetuning for Swedish. We translate a dataset of generated instructions from English to Swedish, using it to finetune both Swedish and non-Swedish models. Results indicate that the use of translated instructions significantly improves the models’ zero-shot performance, even on unseen data, while staying competitive with strong baselines ten times in size. We see this paper is a first step and a proof of concept that instruction finetuning for Swedish is within reach, through resourceful means, and that there exist several directions for further improvements.

Keywords
NLP, natural language processing, language models, gpt, instruction tuning, instruction finetuning, multilingual, zero-shot
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-196546 (URN)
Conference
NoDaLiDa
Funder
CUGS (National Graduate School in Computer Science)Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-08-11 Created: 2023-08-11 Last updated: 2023-08-11
Kunz, J., Jirénius, M., Holmström, O. & Kuhlmann, M. (2022). Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions. In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP: . Paper presented at BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, December 8, 2022 (pp. 164-177). , 5, Article ID 2022.blackboxnlp-1.14.
Open this publication in new window or tab >>Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions
2022 (English)In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022, Vol. 5, p. 164-177, article id 2022.blackboxnlp-1.14Conference paper, Published paper (Refereed)
Abstract [en]

Models able to generate free-text rationales that explain their output have been proposed as an important step towards interpretable NLP for “reasoning” tasks such as natural language inference and commonsense question answering. However, the relative merits of different architectures and types of rationales are not well understood and hard to measure. In this paper, we contribute two insights to this line of research: First, we find that models trained on gold explanations learn to rely on these but, in the case of the more challenging question answering data set we use, fail when given generated explanations at test time. However, additional fine-tuning on generated explanations teaches the model to distinguish between reliable and unreliable information in explanations. Second, we compare explanations by a generation-only model to those generated by a self-rationalizing model and find that, while the former score higher in terms of validity, factual correctness, and similarity to gold explanations, they are not more useful for downstream classification. We observe that the self-rationalizing model is prone to hallucination, which is punished by most metrics but may add useful context for the classification step.

Keywords
Large Language Models, Neural Networks, Transformers, Interpretability, Explainability
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-195615 (URN)
Conference
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, December 8, 2022
Available from: 2023-06-22 Created: 2023-06-22 Last updated: 2024-04-02Bibliographically approved
Organisations

Search in DiVA

Show all publications