German Compounds in Factored Statistical Machine Translation
2008 (English)In: -, Berlin, Germany: Springer , 2008, 464-475 p.Conference paper (Refereed)
An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds are merged based on part-of-speech in a postprocessing step. Compound parts are marked, to separate them from ordinary words. Translation quality is improved in both translation directions and the number of untranslated words in the English output is reduced. Different versions of the splitting algorithm performs best in the two different translation directions.
Place, publisher, year, edition, pages
Berlin, Germany: Springer , 2008. 464-475 p.
machine translation, compounds
National CategoryComputer Science
IdentifiersURN: urn:nbn:se:liu:diva-44110DOI: 10.1007/978-3-540-85287-2_44Local ID: 75561OAI: oai:DiVA.org:liu-44110DiVA: diva2:264971
6th International Conference on Natural Language Processing GoTAL, 2008