liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Word Alignment by Re-using Parallel Phrases
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2008 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more  context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present  experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.

Place, publisher, year, edition, pages
Linköping: LIU-tryck , 2008. , 95 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1392
Keyword [en]
computational linguistics
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-15463ISBN: 978-91-7393-728-3 (print)OAI: oai:DiVA.org:liu-15462DiVA: diva2:117485
Presentation
John von Neumann, Hus B, IDA (English)
Opponent
Supervisors
Available from: 2008-11-24 Created: 2008-11-11 Last updated: 2009-03-11Bibliographically approved

Open Access in DiVA

fulltext(573 kB)824 downloads
File information
File name FULLTEXT01.pdfFile size 573 kBChecksum SHA-512
e19a66e2c82f00b426a95e4dbb0a38875c469141ea576f77049daee23f42409d5b94c978673093c52b0dccf88b8811bbdcf1fbda661b9902847979c4e26e9f51
Type fulltextMimetype application/pdf
cover(29 kB)62 downloads
File information
File name COVER01.pdfFile size 29 kBChecksum SHA-512
8475586a5530980e768b2984727d8954d6462d5fa3f9185ba83386fd49fb97468d9db630bab85d00526e140dc67b1eac855f14b577783fdeb38ba718ab3c3b65
Type coverMimetype application/pdf

Authority records BETA

Holmqvist, Maria

Search in DiVA

By author/editor
Holmqvist, Maria
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 824 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 264 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf