Clustered Word Classes for Preordering in Statistical Machine Translation
2012 (English)In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2012, 28-34 p.Conference paper (Refereed)
Clustered word classes have been used in connection with statistical machine translation, for instance for improving word alignments. In this work we investigate if clustered word classes can be used in a preordering strategy, where the source language is reordered prior to training and translation. Part-of-speech tagging has previously been successfully used for learning reordering rules that can be applied before training and translation. We show that we can use word clusters for learning rules, and significantly improve on a baseline with only slightly worse performance than for standard POS-tags on an English–German translation task. We also show the usefulness of the approach for the less-resourced language Haitian Creole, for translation into English, where the suggested approach is significantly better than the baseline.
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2012. 28-34 p.
Statistical machine translation, reordering, clustering, unsupervised learning
Language Technology (Computational Linguistics) Computer Science
IdentifiersURN: urn:nbn:se:liu:diva-76706OAI: oai:DiVA.org:liu-76706DiVA: diva2:516094
The 13th Conference of the European Chapter of the Association for Computational Linguistics April 24, Avignon, France