A Natural Language Processing Model for Automated Organization and Analysis of Intangible Cultural HeritageShow others and affiliations
2024 (English)In: Journal of Organizational and End User Computing, ISSN 1546-2234, E-ISSN 1546-5012, Vol. 36, no 1, article id 349736Article in journal (Refereed) Published
Abstract [en]
This paper investigates text similarity methods in the field of NLP, improves upon the WMD, and develops the SWC-WMD distance, forming the basis for a clustering method for long ICH texts. Clustering experiments on the constructed ICH long text dataset using WMD, SWC-WMD, and TFIDF-WMD distances were conducted. The impact of the number of feature words on clustering results and the effect of different distances on clustering outcomes were assessed based on accuracy and F1 values from the evaluation criteria. The final results show that the SWC-WMD distance improves the accuracy and F1 values of the ICH long text clustering results compared to the other two distances, thereby proving the effectiveness of the methods proposed in this paper.
Place, publisher, year, edition, pages
IGI GLOBAL , 2024. Vol. 36, no 1, article id 349736
Keywords [en]
Intangible Cultural Heritage; Natural Language Processing; Text Clustering; Word Mover's Distance; Word2vec
National Category
Information Systems, Social aspects
Identifiers
URN: urn:nbn:se:liu:diva-207265DOI: 10.4018/JOEUC.349736ISI: 001293789600025OAI: oai:DiVA.org:liu-207265DiVA, id: diva2:1895430
Note
Funding Agencies|China Academic Degrees & Graduate Education Development Center [ZT-231053013]; Hunan Provincial Department of Education [23A0747]; Hunan Province [HNJG-2022-0069]; Philosophy and Social Sciences in Hunan Province: Digital Application Laboratory of Red Cultural Resources, Xiangtan University
2024-09-052024-09-052024-09-05