liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating the Accuracy of GUI Testing Using Multi-Modal Large Language Models
Linköping University, Department of Computer and Information Science.
Linköping University, Department of Computer and Information Science.
2026 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Utvärdering av noggrannheten i GUI-testning med multimodala stora språkmodeller (Swedish)
Abstract [en]

Recent advances in Multi-modal Large Language Models (M-LLMs) have created new opportunities for automating Graphical User Interface (GUI) testing through screenshot-based interaction guided by natural-language instructions. This thesis investigates how accurately these models can execute GUI test actions, how sensitive they are to controlled variations in GUI layout and instruction wording, how reliable their final verdicts are, and how model size affects execution efficiency.

To study this, a proof-of-concept pipeline called GUIOracle was implemented. The pipeline combines specification, interaction, and verification stages to execute natural-language GUI test instructions using screenshot capture, GUI parsing, local multi-modal model inference, and automated final-state assessment. The approach was evaluated in two environments: Tacsi, an industrially relevant simulator at Saab, and OpenScope, a controlled GUI environment used for comparative experimentation. Five Qwen 3.5 model sizes, from 0.8B to 27B parameters, were included in the evaluation, although the 27B model was tested only in OpenScope.

The results show that GUI action accuracy, scenario success, and verification-stage reliability improved clearly with model size. The smallest models were often unreliable and frequently timed out, whereas the larger models were better at executing correct GUI actions and completing scenarios successfully. Controlled GUI layout changes generally had a stronger negative effect than instruction wording changes, indicating that the approach was more sensitive to layout variation than to simplified phrasing. Runtime analysis further showed that smaller models were not automatically the most practical, since weaker action selection often led to longer interaction traces. The 27B model achieved the highest action accuracy, but the 9B model provided the best balance between correctness and execution efficiency. These findings suggest that local M-LLM-based GUI testing has promising potential in controlled environments, although the approach is best viewed as a complement to existing testing methods rather than as a complete replacement.

Place, publisher, year, edition, pages
2026. , p. 83
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:liu:diva-224405ISRN: LIU-IDA/LITH-EX-A--26/020--SEOAI: oai:DiVA.org:liu-224405DiVA, id: diva2:2064667
External cooperation
Saab
Supervisors
Examiners
Available from: 2026-06-02 Created: 2026-06-02 Last updated: 2026-06-02Bibliographically approved

Open Access in DiVA

fulltext(10200 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 10200 kBChecksum SHA-512
bd7e8b4753364dc5324654364e8623693d7fedf4b879fe08c9034859d3998091dd3e2109f885939dd5f5c88a2dcf263fbca597581647084c15c61abeb4449d03
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Lindeborg, AndreasÖdquist, Klara
By organisation
Department of Computer and Information Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 85 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf