liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Kuhlmann, Marco, ProfessorORCID iD iconorcid.org/0000-0002-2492-9872
Alternative names
Publications (10 of 48) Show all publications
Holmström, O., Kunz, J. & Kuhlmann, M. (2023). Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish. In: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023): . Paper presented at RESOURCEFUL workshop at NoDaLiDa (pp. 92-110). Tórshavn, the Faroe Islands
Open this publication in new window or tab >>Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish
2023 (English)In: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), Tórshavn, the Faroe Islands, 2023, p. 92-110Conference paper, Published paper (Refereed)
Abstract [en]

Large language models (LLMs) have substantially improved natural language processing (NLP) performance, but training these models from scratch is resource-intensive and challenging for smaller languages. With this paper, we want to initiate a discussion on the necessity of language-specific pre-training of LLMs. We propose how the “one model-many models” conceptual framework for task transfer can be applied to language transfer and explore this approach by evaluating the performance of non-Swedish monolingual and multilingual models’ performance on tasks in Swedish. Our findings demonstrate that LLMs exposed to limited Swedish during training can be highly capable and transfer competencies from English off-the-shelf, including emergent abilities such as mathematical reasoning, while at the same time showing distinct culturally adapted behaviour. Our results suggest that there are resourceful alternatives to language-specific pre-training when creating useful LLMs for small languages.

Place, publisher, year, edition, pages
Tórshavn, the Faroe Islands: , 2023
Keywords
NLP, Natural Language Processing, language model, GPT, monolingual, multilingual, cross-lingual, one model-many models
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-196545 (URN)
Conference
RESOURCEFUL workshop at NoDaLiDa
Funder
CUGS (National Graduate School in Computer Science)
Available from: 2023-08-11 Created: 2023-08-11 Last updated: 2023-08-11
Norlund, T., Doostmohammadi, E., Johansson, R. & Kuhlmann, M. (2023). On the Generalization Ability of Retrieval-Enhanced Transformers. In: Findings of the Association for Computational Linguistics: . Paper presented at EACL 2023, May 2-6, 2023 (pp. 1485-1493). ASSOC COMPUTATIONAL LINGUISTICS-ACL
Open this publication in new window or tab >>On the Generalization Ability of Retrieval-Enhanced Transformers
2023 (English)In: Findings of the Association for Computational Linguistics, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2023, p. 1485-1493Conference paper, Published paper (Refereed)
Abstract [en]

Recent work on the Retrieval-Enhanced Transformer (Retro) model has shown that offloading memory from trainable weights to a retrieval database can significantly improve language modeling and match the performance of non-retrieval models that are an order of magnitude larger in size. It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval. In this paper, we try to better understand the relative contributions of these two components. We find that the performance gains from retrieval largely originate from over-lapping tokens between the database and the test data, suggesting less non-trivial generalization than previously assumed. More generally, our results point to the challenges of evaluating the generalization of retrieval-augmented language models such as Retro, as even limited token overlap may significantly decrease test-time loss. We release our code and model at https://github.com/TobiasNorlund/retro

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2023
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-195609 (URN)001181085100107 ()9781959429470 (ISBN)
Conference
EACL 2023, May 2-6, 2023
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]

Available from: 2023-06-22 Created: 2023-06-22 Last updated: 2024-04-23Bibliographically approved
Doostmohammadi, E., Norlund, T., Kuhlmann, M. & Johansson, R. (2023). Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers): . Paper presented at 61st Annual Meeting of the the Association-for-Computational-Linguistics (ACL), Toronto, CANADA, jul 09-14, 2023 (pp. 521-529). ASSOC COMPUTATIONAL LINGUISTICS-ACL
Open this publication in new window or tab >>Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
2023 (English)In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2023, p. 521-529Conference paper, Published paper (Refereed)
Abstract [en]

Augmenting language models with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2023
National Category
Applied Mechanics
Identifiers
urn:nbn:se:liu:diva-196564 (URN)001181088800045 ()9781959429715 (ISBN)
Conference
61st Annual Meeting of the the Association-for-Computational-Linguistics (ACL), Toronto, CANADA, jul 09-14, 2023
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Alvis - Swedish Research Council [2022-06725]; AliceWallenberg Foundation at the National Supercomputer Center

Available from: 2023-08-14 Created: 2023-08-14 Last updated: 2024-04-23
Kunz, J., Jirénius, M., Holmström, O. & Kuhlmann, M. (2022). Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions. In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP: . Paper presented at BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, December 8, 2022 (pp. 164-177). , 5, Article ID 2022.blackboxnlp-1.14.
Open this publication in new window or tab >>Human Ratings Do Not Reflect Downstream Utility: A Study of Free-Text Explanations for Model Predictions
2022 (English)In: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022, Vol. 5, p. 164-177, article id 2022.blackboxnlp-1.14Conference paper, Published paper (Refereed)
Abstract [en]

Models able to generate free-text rationales that explain their output have been proposed as an important step towards interpretable NLP for “reasoning” tasks such as natural language inference and commonsense question answering. However, the relative merits of different architectures and types of rationales are not well understood and hard to measure. In this paper, we contribute two insights to this line of research: First, we find that models trained on gold explanations learn to rely on these but, in the case of the more challenging question answering data set we use, fail when given generated explanations at test time. However, additional fine-tuning on generated explanations teaches the model to distinguish between reliable and unreliable information in explanations. Second, we compare explanations by a generation-only model to those generated by a self-rationalizing model and find that, while the former score higher in terms of validity, factual correctness, and similarity to gold explanations, they are not more useful for downstream classification. We observe that the self-rationalizing model is prone to hallucination, which is punished by most metrics but may add useful context for the classification step.

Keywords
Large Language Models, Neural Networks, Transformers, Interpretability, Explainability
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-195615 (URN)
Conference
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, December 8, 2022
Available from: 2023-06-22 Created: 2023-06-22 Last updated: 2024-04-02Bibliographically approved
Doostmohammadi, E. & Kuhlmann, M. (2022). On the Effects of Video Grounding on Language Models. In: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models: . Paper presented at First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models.
Open this publication in new window or tab >>On the Effects of Video Grounding on Language Models
2022 (English)In: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models, 2022Conference paper, Oral presentation with published abstract (Other academic)
Abstract [en]

Transformer-based models trained on text and vision modalities try to improve the performance on multimodal downstream tasks or tackle the problem Transformer-based models trained on text and vision modalities try to improve the performance on multimodal downstream tasks or tackle the problem of lack of grounding, e.g., addressing issues like models’ insufficient commonsense knowledge. While it is more straightforward to evaluate the effects of such models on multimodal tasks, such as visual question answering or image captioning, it is not as well-understood how these tasks affect the model itself, and its internal linguistic representations. In this work, we experiment with language models grounded in videos and measure the models’ performance on predicting masked words chosen based on their imageability. The results show that the smaller model benefits from video grounding in predicting highly imageable words, while the results for the larger model seem harder to interpret.of lack of grounding, e.g., addressing issues like models’ insufficient commonsense knowledge. While it is more straightforward to evaluate the effects of such models on multimodal tasks, such as visual question answering or image captioning, it is not as well-understood how these tasks affect the model itself, and its internal linguistic representations. In this work, we experiment with language models grounded in videos and measure the models’ performance on predicting masked words chosen based on their imageability. The results show that the smaller model benefits from video grounding in predicting highly imageable words, while the results for the larger model seem harder to interpret.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-198261 (URN)
Conference
First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models
Available from: 2023-10-02 Created: 2023-10-02 Last updated: 2023-10-13Bibliographically approved
Kunz, J. & Kuhlmann, M. (2022). Where Does Linguistic Information Emerge in Neural Language Models?: Measuring Gains and Contributions across Layers. In: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na (Ed.), Proceedings of the 29th International Conference on Computational Linguistics: . Paper presented at COLING, October 12–17, 2022 (pp. 4664-4676). , Article ID 1.413.
Open this publication in new window or tab >>Where Does Linguistic Information Emerge in Neural Language Models?: Measuring Gains and Contributions across Layers
2022 (English)In: Proceedings of the 29th International Conference on Computational Linguistics / [ed] Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na, 2022, p. 4664-4676, article id 1.413Conference paper, Published paper (Refereed)
Abstract [en]

Probing studies have extensively explored where in neural language models linguistic information is located. The standard approach to interpreting the results of a probing classifier is to focus on the layers whose representations give the highest performance on the probing task. We propose an alternative method that asks where the task-relevant information emerges in the model. Our framework consists of a family of metrics that explicitly model local information gain relative to the previous layer and each layer’s contribution to the model’s overall performance. We apply the new metrics to two pairs of syntactic probing tasks with different degrees of complexity and find that the metrics confirm the expected ordering only for one of the pairs. Our local metrics show a massive dominance of the first layers, indicating that the features that contribute the most to our probing tasks are not as high-level as global metrics suggest.

Keywords
NLP, AI, Language Technology, Computational Linguistics, Machine Learning
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-191000 (URN)
Conference
COLING, October 12–17, 2022
Available from: 2023-01-12 Created: 2023-01-12 Last updated: 2024-05-23Bibliographically approved
Kunz, J. & Kuhlmann, M. (2021). Test Harder Than You Train: Probing with Extrapolation Splits. In: Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad (Ed.), Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP: . Paper presented at BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, November 11, 2021 (pp. 15-25). Punta Cana, Dominican Republic, 5, Article ID 2.
Open this publication in new window or tab >>Test Harder Than You Train: Probing with Extrapolation Splits
2021 (English)In: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP / [ed] Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad, Punta Cana, Dominican Republic, 2021, Vol. 5, p. 15-25, article id 2Conference paper, Published paper (Refereed)
Abstract [en]

Previous work on probing word representations for linguistic knowledge has focused on interpolation tasks. In this paper, we instead analyse probes in an extrapolation setting, where the inputs at test time are deliberately chosen to be ‘harder’ than the training examples. We argue that such an analysis can shed further light on the open question whether probes actually decode linguistic knowledge, or merely learn the diagnostic task from shallow features. To quantify the hardness of an example, we consider scoring functions based on linguistic, statistical, and learning-related criteria, all of which are applicable to a broad range of NLP tasks. We discuss the relative merits of these criteria in the context of two syntactic probing tasks, part-of-speech tagging and syntactic dependency labelling. From our theoretical and experimental analysis, we conclude that distance-based and hard statistical criteria show the clearest differences between interpolation and extrapolation settings, while at the same time being transparent, intuitive, and easy to control.

Place, publisher, year, edition, pages
Punta Cana, Dominican Republic: , 2021
Keywords
Natural Language Processing, Neural Language Models, Interpretability, Probing, BERT, Extrapolation
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-182166 (URN)10.18653/v1/2021.blackboxnlp-1.2 (DOI)
Conference
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, November 11, 2021
Available from: 2022-01-10 Created: 2022-01-10 Last updated: 2024-04-02Bibliographically approved
Kunz, J. & Kuhlmann, M. (2020). Classifier Probes May Just Learn from Linear Context Features. In: Proceedings of the 28th International Conference on Computational Linguistics: . Paper presented at International Conference on Computational Linguistics (COLING), Barcelona, Spain (Online), December 8–13, 2020 (pp. 5136-5146). , 28, Article ID 450.
Open this publication in new window or tab >>Classifier Probes May Just Learn from Linear Context Features
2020 (English)In: Proceedings of the 28th International Conference on Computational Linguistics, 2020, Vol. 28, p. 5136-5146, article id 450Conference paper, Published paper (Refereed)
Abstract [en]

Classifiers trained on auxiliary probing tasks are a popular tool to analyze the representations learned by neural sentence encoders such as BERT and ELMo. While many authors are aware of the difficulty to distinguish between “extracting the linguistic structure encoded in the representations” and “learning the probing task,” the validity of probing methods calls for further research. Using a neighboring word identity prediction task, we show that the token embeddings learned by neural sentence encoders contain a significant amount of information about the exact linear context of the token, and hypothesize that, with such information, learning standard probing tasks may be feasible even without additional linguistic structure. We develop this hypothesis into a framework in which analysis efforts can be scrutinized and argue that, with current models and baselines, conclusions that representations contain linguistic structure are not well-founded. Current probing methodology, such as restricting the classifier’s expressiveness or using strong baselines, can help to better estimate the complexity of learning, but not build a foundation for speculations about the nature of the linguistic structure encoded in the learned representations.

Keywords
Natural Language Processing, Machine Learning, Neural Language Representations
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:liu:diva-175384 (URN)10.18653/v1/2020.coling-main.450 (DOI)
Conference
International Conference on Computational Linguistics (COLING), Barcelona, Spain (Online), December 8–13, 2020
Available from: 2021-04-30 Created: 2021-04-30 Last updated: 2024-04-02Bibliographically approved
Kurtz, R., Oepen, S. & Kuhlmann, M. (2020). End-to-End Negation Resolution as Graph Parsing. In: Bouma, Gosse and Matsumoto, Yuji and Oepen, Stephan and Sagae, Kenji and Seddah, Djamé and Sun, Weiwei and Søgaard, Anders and Tsarfaty, Reut and Zeman, Dan (Ed.), Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies: . Paper presented at 16th International Conference on Parsing Technologies (IWPT) - Shared Task on Parsing into Enhanced Universal (pp. 14-24). Association for Computational Linguistics
Open this publication in new window or tab >>End-to-End Negation Resolution as Graph Parsing
2020 (English)In: Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies / [ed] Bouma, Gosse and Matsumoto, Yuji and Oepen, Stephan and Sagae, Kenji and Seddah, Djamé and Sun, Weiwei and Søgaard, Anders and Tsarfaty, Reut and Zeman, Dan, Association for Computational Linguistics, 2020, p. 14-24Conference paper, Published paper (Refereed)
Abstract [en]

We present a neural end-to-end architecture for negation resolution based on a formulation of the task as a graph parsing problem. Our approach allows for the straightforward inclusion of many types of graph-structured features without the need for representation-specific heuristics. In our experiments, we specifically gauge the usefulness of syntactic information for negation resolution. Despite the conceptual simplicity of our architecture, we achieve state-of-the-art results on the Conan Doyle benchmark dataset, including a new top result for our best model.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2020
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-167051 (URN)10.18653/v1/2020.iwpt-1.3 (DOI)000563425200003 ()978-1-952148-11-8 (ISBN)
Conference
16th International Conference on Parsing Technologies (IWPT) - Shared Task on Parsing into Enhanced Universal
Available from: 2020-06-25 Created: 2020-06-25 Last updated: 2020-12-02
Kurtz, R., Roxbo, D. & Kuhlmann, M. (2019). Improving Semantic Dependency Parsing with Syntactic Features. In: Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing: . Paper presented at First NLPL Workshop on Deep Learning for Natural Language Processing, Turku,Finland,30September 2019 (pp. 12-21). Linköping University Electronic Press
Open this publication in new window or tab >>Improving Semantic Dependency Parsing with Syntactic Features
2019 (English)In: Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing, Linköping University Electronic Press, 2019, p. 12-21Conference paper, Published paper (Refereed)
Abstract [en]

We extend a state-of-the-art deep neural architecture for semantic dependency parsing with features defined over syntactic dependency trees. Our empirical results show that only gold-standard syntactic information leads to consistent improvements in semantic parsing accuracy, and that the magnitude of these improvements varies with the specific combination of the syntactic and the semantic representation used. In contrast, automatically predicted syntax does not seem to help semantic parsing. Our error analysis suggests that there is a significant overlap between syntactic and semantic representations.

Place, publisher, year, edition, pages
Linköping University Electronic Press, 2019
Series
Linköping Electronic Conference Proceedings, ISSN 1650-3686, E-ISSN 1650-3740
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-167050 (URN)978-91-7929-999-6 (ISBN)
Conference
First NLPL Workshop on Deep Learning for Natural Language Processing, Turku,Finland,30September 2019
Available from: 2020-06-25 Created: 2020-06-25 Last updated: 2024-01-29
Projects
Accurate and efficient non-projective dependency parsing [2008-00296_VR]; Uppsala University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2492-9872

Search in DiVA

Show all publications