liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Productive Generation of Compound Words in Statistical Machine Translation
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Xerox Research Centre Europe.
2011 (English)In: Proceedings of the Sixth Workshop on Statistical Machine Translation (WMT 2011): Chris Callison-Burch, Philipp Koehn, Christof Monz, Omar F. Zaidan, 2011, 250-260 p.Conference paper, Published paper (Refereed)
Abstract [en]

In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.

Place, publisher, year, edition, pages
2011. 250-260 p.
Keyword [en]
Machine translation, compounds, CRF
National Category
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-70128OAI: oai:DiVA.org:liu-70128DiVA: diva2:435706
Conference
The Sixth Workshop on Statistical Machine Translation (WMT 2011)
Available from: 2011-08-19 Created: 2011-08-19

Open Access in DiVA

No full text

Other links

http://aclweb.org/anthology-new/W/W11/W11-2129.pdf

Authority records BETA

Stymne, Sara

Search in DiVA

By author/editor
Stymne, Sara
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)Language Technology (Computational Linguistics)Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 237 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf