liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Word Alignment by Re-using Parallel Phrases
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2008 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more  context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present  experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.

Place, publisher, year, edition, pages
Linköping: LIU-tryck , 2008. , 95 p.
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1392
Keyword [en]
computational linguistics
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:liu:diva-15463ISBN: 978-91-7393-728-3OAI: diva2:117485
John von Neumann, Hus B, IDA (English)
Available from: 2008-11-24 Created: 2008-11-11 Last updated: 2009-03-11Bibliographically approved

Open Access in DiVA

fulltext(573 kB)714 downloads
File information
File name FULLTEXT01.pdfFile size 573 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(29 kB)54 downloads
File information
File name COVER01.pdfFile size 29 kBChecksum SHA-512
Type coverMimetype application/pdf

Search in DiVA

By author/editor
Holmqvist, Maria
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 714 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 245 hits
ReferencesLink to record
Permanent link

Direct link