liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Topic modelling applied to a second language: A language adaption and tool evaluation study
The Institute for Language and Folklore, Sweden.ORCID-id: 0000-0001-6164-7762
The Institute for Language and Folklore, Sweden.
Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM).ORCID-id: 0000-0002-1907-7820
Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM).ORCID-id: 0000-0002-0519-2537
Visa övriga samt affilieringar
2020 (Engelska)Ingår i: Selected Papers from the CLARIN Annual Conference 2019 / [ed] Kiril Simov and Maria Eskevich, Linköping University Electronic Press, 2020, s. 145-156, artikel-id 17Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The Topics2Themes tool, which enables text analysis on the output of topic modelling, was originally developed for the English language. In this study, we explored and evaluated adaptations required for applying the tool to Japanese texts. That is, we adapted Topics2Themes to a language that is very different from the one for which the tool was originally developed. To apply Topics2Themes to Japanese texts, in which white space is not used for indicating word boundaries, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation. Topics2Themes was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese. To evaluate the adaptation to a second language, as well as the reading support, we applied the tool to a corpus consisting of short Japanese texts. Twelve different topics were automatically identified, and a total of 183 texts representative for the twelve topics were extracted. A learner of Japanese carried out a manual analysis of these representative texts, and identified 35 reoccurring, fine-grained themes.

Ort, förlag, år, upplaga, sidor
Linköping University Electronic Press, 2020. s. 145-156, artikel-id 17
Serie
Linköping Electronic Conference Proceedings, ISSN 1650-3686, E-ISSN 1650-3740 ; 172
Nyckelord [en]
Topic Models, Visualization, Japanese, Text Mining, Visual Text Analysis
Nationell ämneskategori
Språkbehandling och datorlingvistik Människa-datorinteraktion (interaktionsdesign)
Forskningsämne
Datavetenskap, Informations- och programvisualisering
Identifikatorer
URN: urn:nbn:se:liu:diva-189513DOI: 10.3384/ecp2020172017ISBN: 978-91-7929-807-4 (digital)OAI: oai:DiVA.org:liu-189513DiVA, id: diva2:1705903
Konferens
CLARIN Annual Conference 2019, 30 September - 2 October 2019, Leipzig, Germany
Forskningsfinansiär
Vetenskapsrådet, 2017-00626Tillgänglig från: 2022-10-24 Skapad: 2022-10-24 Senast uppdaterad: 2025-02-01

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextFulltext

Person

Kucher, KostiantynKerren, Andreas

Sök vidare i DiVA

Av författaren/redaktören
Skeppstedt, MariaKucher, KostiantynKerren, Andreas
Språkbehandling och datorlingvistikMänniska-datorinteraktion (interaktionsdesign)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 126 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf