liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Text Simplification via Synonym Replacement
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Automatiskt textförenkling genom synonymutbyte (Swedish)
Abstract [en]

In this study automatic lexical simplification via synonym replacement in Swedish was investigated using three different strategies for choosing alternative synonyms: based on word frequency, based on word length, and based on level of synonymy. These strategies were evaluated in terms of standardized readability metrics for Swedish, average word length, proportion of long words, and in relation to the ratio of errors (type A) and number of replacements. The effect of replacements on different genres of texts was also examined. The results show that replacement based on word frequency and word length can improve readability in terms of established metrics for Swedish texts for all genres but that the risk of introducing errors is high. Attempts were made at identifying criteria thresholds that would decrease the ratio of errors but no general thresholds could be identified. In a final experiment word frequency and level of synonymy were combined using predefined thresholds. When more than one word passed the thresholds word frequency or level of synonymy was prioritized. The strategy was significantly better than word frequency alone when looking at all texts and prioritizing level of synonymy. Both prioritizing frequency and level of synonymy were significantly better for the newspaper texts. The results indicate that synonym replacement on a one-to-one word level is very likely to produce errors. Automatic lexical simplification should therefore not be regarded a trivial task, which is too often the case in research literature. In order to evaluate the true quality of the texts it would be valuable to take into account the specific reader. A simplified text that contains some errors but which fails to appreciate subtle differences in terminology can still be very useful if the original text is too difficult to comprehend to the unassisted reader.

Place, publisher, year, edition, pages
2012. , p. 68
Keywords [en]
Lexical simplification, synonym replacement, SynLex
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:liu:diva-84637ISRN: LIU-IDA/KOGVET-A--12/014--SEOAI: oai:DiVA.org:liu-84637DiVA, id: diva2:560901
Subject / course
Cognitive science
Uppsok
Humanities, Theology
Supervisors
Examiners
Available from: 2012-10-16 Created: 2012-10-16 Last updated: 2022-09-09Bibliographically approved

Open Access in DiVA

Automatic Text Simplification via Synonym Replacement(1048 kB)3911 downloads
File information
File name FULLTEXT01.pdfFile size 1048 kBChecksum SHA-512
1367119d3d95e224ce793a49f8554e4ce49a65c2a2cd74fcecc942fccbf78119b5262f4f0c07bff49b1967cba1696d999b32779d4262b3b3b54d892a4e08a744
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Keskisärkkä, Robin
By organisation
Department of Computer and Information ScienceFaculty of Arts and Sciences
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 3911 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2104 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf