liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of two word alignment systems
Linköping University, Department of Computer and Information Science.
2004 (English)Independent thesis Basic level (professional degree)Student thesis
Abstract [en]

This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies.

The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while I*Trix is better for small corpora. Especially for corpora with high statistical ratio or special resource, I*Trix has a better performance.

Place, publisher, year, edition, pages
Institutionen för datavetenskap , 2004. , 55 p.
Keyword [en]
Datalogi, Word alignment, Giza++, I*Trix, Parallel corpora, Statistical ratio, Evaluation, I*Eval, Gold standard.
Keyword [sv]
Datalogi
National Category
Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-2215ISRN: LITH-IDA-EX--04/019--SEOAI: oai:DiVA.org:liu-2215DiVA: diva2:19545
Uppsok
teknik
Available from: 2004-04-07 Created: 2004-04-07

Open Access in DiVA

fulltext(351 kB)745 downloads
File information
File name FULLTEXT01.pdfFile size 351 kBChecksum SHA-1
5fbe9b0a7e958558b087ab7c85593628bc23ff8ea0d6b843e54d6867157d0ab02e951e69
Type fulltextMimetype application/pdf

By organisation
Department of Computer and Information Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 745 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 344 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf