liu.seSearch for publications in DiVA
4546474849505148 of 77
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enabling Natural Language Interaction With Decentralized Data Sources Using Large Language Models
Linköping University, Department of Computer and Information Science.
Linköping University. (IDA)
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Möjliggörande av interaktion på naturligt språk med decentraliserade datakällor med hjälp av stora språkmodeller (Swedish)
Abstract [en]

Applications powered by Large Language Models (LLMs) can lower the barrier toquerying decentralized data, where using SPARQL remains difficult for non-technical users. This thesis explores how LLMs can be leveraged to facilitate natural language (NL) interaction with decentralized data systems. A bi-directional NL to SPARQL translation pipeline was designed and implemented, then integrated with a simple user interface (UI). The evaluation of the pipeline, involved using a gold-standard dataset of 21 competency questions derived from the Onto-DESIDE project, with three independent runs per setting to account for LLM variability. For NL to SPARQL, the metrics included execution success and result accuracy (identical, underfetching, overfetching and both underfetching and overfetching). For SPARQL to NL, the pipeline was rated based on Clarity and Faithfulness on a Likert scale, comparing zero shot and one shot prompting strategies. Acrossthe runs, the LLM consistently produced a syntactically valid SPARQL (execution success>95% for both zero shot and one shot), yet semantic accuracy was low: only around a fifth of the LLM-generated SPARQL queries matched the goldstandard results, with frequent overfetching and/or underfetching. Surprisingly, one shot prompting did not improve semantic accuracy for NL to SPARQL translation. However, SPARQL to NL explanations were clear and highly faithful, and one shot prompting further improved clarity and reduced variability. UNION and FILTER NOT EXISTS were identified as categories where the LLM performed worse in the zero shot approach. Limitations of the thesis, include ambiguity in NL questions, LLM’s tendency to return URIs rather than labels and the non-production ready solution of including ontologies in the prompts. The findings indicate that the SPARQL to NL module is immediately useful for explaining SPARQL queries, whereas the NL to SPARQL module is not yet reliable for unsupervised use. Future workinclude refining prompt-engineering strategies, developing dynamic context management for ontology retrieval and implementing a post-processing validation layer to improve performance.

Place, publisher, year, edition, pages
2025. , p. 74
Keywords [en]
Large Language Models, LLMs, NL to SPARQL, SPARQL to NL, Solid Pods, Decentralized Data Access, Semantic Web, RDF Data
National Category
Natural Language Processing Computer and Information Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:liu:diva-219592ISRN: LIU-IDA/LITH-EX-A--25/102--SEOAI: oai:DiVA.org:liu-219592DiVA, id: diva2:2015002
Subject / course
Computer Engineering
Presentation
2025-09-30, Alan Turing, LIU, Linköping, 08:00 (Swedish)
Supervisors
Examiners
Available from: 2025-11-24 Created: 2025-11-19 Last updated: 2025-11-24Bibliographically approved

Open Access in DiVA

fulltext(656 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 656 kBChecksum SHA-512
2bf62d71e42e15ee5f622511cfabbe5cba7bb6da89894e70ebb7fcbd0cf72335110867f014cf4b05ab2d8c229b3767d8a5a95441dba533fa46541f6047f80cfa
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Birgersson, ErikAndarzig, Kiana
By organisation
Department of Computer and Information ScienceLinköping University
Natural Language ProcessingComputer and Information SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 299 hits
4546474849505148 of 77
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf