liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Similarity Network Analysis to Improve Text Similarity Calculations
Linnaeus University, Sweden.ORCID iD: 0000-0001-6150-0787
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering. (iVis, INV)ORCID iD: 0000-0002-1907-7820
Blekinge Institute of Technology, Sweden.ORCID iD: 0000-0001-6745-4398
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering. Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM). (iVis, INV)ORCID iD: 0000-0002-0519-2537
2025 (English)In: Applied Network Science, E-ISSN 2364-8228, Vol. 10, article id 8Article in journal (Refereed) Published
Abstract [en]

Similarity-based analysis is a powerful and intuitive tool for exploring large data sets, for instance, for revealing patterns by grouping items by similarity or for recommending items based on selected samples. However, similarity is an abstract and subjective property which makes it hard to evaluate by a purely computational approach. Furthermore, there are usually several possible computational models that could be applied to the data, each with its own strengths and weaknesses. With this in mind, we aim to extend the research frontier regarding what impact the choice of a computational model may have on the results. In this paper, we target the scope of embedding-based similarity calculations on text documents and seek to answer the research question: "How can a better understanding of the continuous similarity distribution captured by different models lead to better similarity calculations on document sets?". We propose a new and generic methodology based on similarity network comparison, and based on this approach, we have developed a computational pipeline together with a prototype visual analytics tool that allows the user to easily assess the level of model agreement/disagreement. To demonstrate the potential of our method, as well as showing its application to real world scenarios, we apply it in an experimental setup using three state-of-the-art text embedding models and three different text corpora. In view of the surprisingly low level of model agreement regarding the data, we also discuss strategies for handling model disagreement.

Place, publisher, year, edition, pages
Springer Nature, 2025. Vol. 10, article id 8
Keywords [en]
Embeddings, Text Similarity Calculations, Similarity Networks, Visual Analytics
National Category
Computer Sciences Human Computer Interaction
Identifiers
URN: urn:nbn:se:liu:diva-212473DOI: 10.1007/s41109-025-00699-7ISI: 001467943200001Scopus ID: 2-s2.0-105000480934OAI: oai:DiVA.org:liu-212473DiVA, id: diva2:1945789
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Note

This work was partially supported through the ELLIIT environment for strategic research in Sweden. The work of Ilir Jusufi was supported in part by the Knowledge Foundation, Sweden, through the project ”Rekryteringar 21, Universitetslektor i spelteknik” under Contract 20210077.

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-05-20

Open Access in DiVA

fulltext(2709 kB)51 downloads
File information
File name FULLTEXT01.pdfFile size 2709 kBChecksum SHA-512
0adf3f14d49f92c25e12693b56f4a855a4e029a51e82f0e25a037c999d7749a21523ef39aed5c0145e12c57f38beee51d5530e4f83e1403ee2c2388920348b04
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kucher, KostiantynKerren, Andreas

Search in DiVA

By author/editor
Witschard, DanielKucher, KostiantynJusufi, IlirKerren, Andreas
By organisation
Media and Information TechnologyFaculty of Science & Engineering
In the same journal
Applied Network Science
Computer SciencesHuman Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar
Total: 51 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 152 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf