Publications
Download:
File size:
351 kb
Format:
application/pdf
Author:
Wang, Xiaoyang (Linköping University, Department of Computer and Information Science)
Title:
Evaluation of two word alignment systems
Department:
Linköping University, Department of Computer and Information Science
Publication type:
Student thesis
Language:
English
Publisher:
Institutionen för datavetenskap
Level:
Independent thesis Basic level (professional degree)
Pages:
55
Year of publ.:
2004
URI:
urn:nbn:se:liu:diva-2215
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2215
ISRN:
LITH-IDA-EX--04/019--SE
Subject category:
Computer science
Undergraduate subject:
Computer science (20-credit final thesis, D level)
Uppsok:
teknik
Keywords(en) :
Datalogi, Word alignment, Giza++, I*Trix, Parallel corpora, Statistical ratio, Evaluation, I*Eval, Gold standard.
Keywords(sv) :
Datalogi
Abstract(en) :

This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies.

The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while I*Trix is better for small corpora. Especially for corpora with high statistical ratio or special resource, I*Trix has a better performance.

Available from:
2004-04-07
Created:
2004-04-07
Statistics:
222 hits
FILE INFORMATION
File size:
351 kb
Mimetype:
application/pdf
Type:
fulltext
Statistics:
456 hits