Processing of Swedish Compounds for Phrase-Based Statistical Machine Translation
2008 (English)In: Proceedings of the 12th European Association for Machine Translation Conference, Hamburg, Germany: HITEC e.V , 2008, 182-191 p.Conference paper (Refereed)
We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We investigated two ways of handling compound parts, by marking them as compound parts or by normalizing them to a canonical form. We found that compound splitting did improve translation into Swedish, according to automatic metrics. For translation into English the results were not consistent across automatic metrics. However, error analysis of compound translation showed a small improvement in the systems that used splitting. The number of untranslated words in the English output was reduced by 50%.
Place, publisher, year, edition, pages
Hamburg, Germany: HITEC e.V , 2008. 182-191 p.
computational linguistics, statistical machine translation
National CategoryComputer Science
IdentifiersURN: urn:nbn:se:liu:diva-44126Local ID: 75720ISBN: 978-300025770-4OAI: oai:DiVA.org:liu-44126DiVA: diva2:264987
12th European Machine Translation Conference, 22-23 September 2008, Hamburg, Germany