Alignment-based profiling of Europarl data in an English-Swedish parallel corpus
2010 (English)In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias, Paris, France: European Language Resources Association (ELRA) , 2010, 3398-3404 p.Conference paper (Refereed)
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the sameparallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structurallevel. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion ofmany types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majorityof Europarl segments are parallel translations.
Place, publisher, year, edition, pages
Paris, France: European Language Resources Association (ELRA) , 2010. 3398-3404 p.
parallel corpora, profiling, translation, English, Swedish
National CategoryLanguage Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:liu:diva-60039ISBN: 2-9517408-6-7OAI: oai:DiVA.org:liu-60039DiVA: diva2:354794