liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Productive Generation of Compound Words in Statistical Machine Translation
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Xerox Research Centre Europe.
2011 (English)In: Proceedings of the Sixth Workshop on Statistical Machine Translation (WMT 2011): Chris Callison-Burch, Philipp Koehn, Christof Monz, Omar F. Zaidan, 2011, 250-260 p.Conference paper (Refereed)
Abstract [en]

In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.

Place, publisher, year, edition, pages
2011. 250-260 p.
Keyword [en]
Machine translation, compounds, CRF
National Category
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Computer Science
URN: urn:nbn:se:liu:diva-70128OAI: diva2:435706
The Sixth Workshop on Statistical Machine Translation (WMT 2011)
Available from: 2011-08-19 Created: 2011-08-19

Open Access in DiVA

No full text

Other links

Search in DiVA

By author/editor
Stymne, Sara
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)Language Technology (Computational Linguistics)Computer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 4 hits
ReferencesLink to record
Permanent link

Direct link