liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Similarity Patterns in a Large Scientific Corpus
Linnaeus University, Sweden.ORCID iD: 0000-0001-6150-0787
Blekinge Institute of Technology, Sweden.ORCID iD: 0000-0001-6745-4398
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering. (iVis, INV)ORCID iD: 0000-0002-1907-7820
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering. Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM). (iVis, INV)ORCID iD: 0000-0002-0519-2537
2025 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 20, no 4, article id e0321114Article in journal (Refereed) Published
Abstract [en]

Similarity-based analysis is a common and intuitive tool for exploring large data sets. For instance, grouping data items by their level of similarity, regarding one or several chosen aspects, can reveal patterns and relations from the intrinsic structure of the data and thus provide important insights in the sense-making process. Existing analytical methods (such as clustering and dimensionality reduction) tend to target questions such as "Which objects are similar?"; but since they are not necessarily well-suited to answer questions such as "How does the result change if we change the similarity criteria?" or "How are the items linked together by the similarity relations?" they do not unlock the full potential of similarity-based analysis—and here we see a gap to fill. In this paper, we propose that the concept of similarity could be regarded as both: (1) a relation between items, and (2) a property in its own, with a specific distribution over the data set. Based on this approach, we developed an embedding-based computational pipeline together with a prototype visual analytics tool which allows the user to perform similarity-based exploration of a large set of scientific publications. To demonstrate the potential of our method, we present two different use cases, and we also discuss the strengths and limitations of our approach.

Place, publisher, year, edition, pages
Public Library of Science (PLoS), 2025. Vol. 20, no 4, article id e0321114
Keywords [en]
Visual Text Analytics, Text Mining, Text Embedding, Network Embedding, Similarity Calculations
National Category
Computer Sciences Human Computer Interaction
Identifiers
URN: urn:nbn:se:liu:diva-212471DOI: 10.1371/journal.pone.0321114ISI: 001488705600008PubMedID: 40258065Scopus ID: 2-s2.0-105003254126OAI: oai:DiVA.org:liu-212471DiVA, id: diva2:1945780
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Note

This work was partially supported through the ELLIIT environment for strategic research in Sweden. The work of Ilir Jusufi was supported in part by the Knowledge Foundation, Sweden, through the project ”Rekryteringar 21, Universitetslektor i spelteknik” under Contract 20210077.

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-05-28

Open Access in DiVA

fulltext(6016 kB)21 downloads
File information
File name FULLTEXT01.pdfFile size 6016 kBChecksum SHA-512
5601d751f4d6960d7414ba8454c598e4b7e3b80323fd46e887c99bec3c75d8302e3d6812052f7a299e1ded85b9ae6d8dd5d5f95c7a0237c7a9dd172af31fbe54
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopusCode repository

Authority records

Kucher, KostiantynKerren, Andreas

Search in DiVA

By author/editor
Witschard, DanielJusufi, IlirKucher, KostiantynKerren, Andreas
By organisation
Media and Information TechnologyFaculty of Science & Engineering
In the same journal
PLOS ONE
Computer SciencesHuman Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 105 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf