Productive Generation of Compound Words in Statistical Machine Translation
2011 (English)In: Proceedings of the Sixth Workshop on Statistical Machine Translation (WMT 2011): Chris Callison-Burch, Philipp Koehn, Christof Monz, Omar F. Zaidan, 2011, 250-260 p.Conference paper (Refereed)
In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.
Place, publisher, year, edition, pages
2011. 250-260 p.
Machine translation, compounds, CRF
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Computer Science
IdentifiersURN: urn:nbn:se:liu:diva-70128OAI: oai:DiVA.org:liu-70128DiVA: diva2:435706
The Sixth Workshop on Statistical Machine Translation (WMT 2011)