liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Processing of Swedish Compounds for Phrase-Based Statistical Machine Translation
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2008 (English)In: Proceedings of the 12th European Association for Machine Translation Conference, Hamburg, Germany: HITEC e.V , 2008, 182-191 p.Conference paper, Published paper (Refereed)
Abstract [en]

We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We investigated two ways of handling compound parts, by marking them as compound parts or by normalizing them to a canonical form. We found that compound splitting did improve translation into Swedish, according to automatic metrics. For translation into English the results were not consistent across automatic metrics. However, error analysis of compound translation showed a small improvement in the systems that used splitting. The number of untranslated words in the English output was reduced by 50%.

Place, publisher, year, edition, pages
Hamburg, Germany: HITEC e.V , 2008. 182-191 p.
Keyword [en]
computational linguistics, statistical machine translation
National Category
Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-44126Local ID: 75720ISBN: 978-300025770-4 (print)OAI: oai:DiVA.org:liu-44126DiVA: diva2:264987
Conference
12th European Machine Translation Conference, 22-23 September 2008, Hamburg, Germany
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2012-09-14Bibliographically approved
In thesis
1. Compound Processing for Phrase-Based Statistical Machine Translation
Open this publication in new window or tab >>Compound Processing for Phrase-Based Statistical Machine Translation
2009 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to sparse data problems.

The adopted strategy for compound processing is to split compounds into their component parts before training and translation. For translation into Swedish and German the parts are merged after translation. I investigate the effect of different splitting algorithms for translation between English and German, and of different merging algorithms for German. I also apply these methods to a different language pair, English--Swedish. Overall the studies show that compound processing is useful, especially for translation from English into German or Swedish. But there are improvements for translation into English as well, such as a reduction of unknown words.

I show that for translation between English and German different splitting algorithms work best for different translation directions. I also design and evaluate a novel merging algorithm based on part-of-speech matching, which outperforms previous methods for compound merging, showing the need for information that is carried through the translation process, rather than only external knowledge sources such as word lists. Most of the methods for compound processing were originally developed for German. I show that these methods can be applied to Swedish as well, with similar results.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2009. 64 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1421
Keyword
Machine translation, compounds, factored translation, statistical machine translation, computational linguistics
National Category
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Computer Science
Identifiers
urn:nbn:se:liu:diva-51416 (URN)978-91-7393-501-2 (ISBN)
Presentation
2009-12-18, Alan Turing, Hus E, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Available from: 2009-12-07 Created: 2009-10-30 Last updated: 2009-12-07Bibliographically approved

Open Access in DiVA

No full text

Other links

Link to Licentiate ThesisLink to publication

Authority records BETA

Stymne, SaraHolmqvist, Maria

Search in DiVA

By author/editor
Stymne, SaraHolmqvist, Maria
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 121 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf