liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition
Linköping University, Department of Mathematics.
2008 (English)Independent thesis Basic level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this masters’ thesis report we present a graph theoretical method used for automatic bilingual lexicon acquisition with parallel texts. We analyze the concept of graph similarity and give an interpretation, of the parallel texts, connected to the vector space model. We represent the parallel texts by a directed, tripartite graph and from here use the corresponding adjacency matrix, A, to compute the similarity of the graph. By solving the eigenvalue problem ρS = ASAT + ATSA we obtain the self-similarity matrix S and the Perron root ρ. A rank k approximation of the self-similarity matrix is computed by implementations of the singular value decomposition and the non-negative matrix factorization algorithm GD-CLS. We construct an algorithm in order to extract the bilingual lexicon from the self-similarity matrix and apply a statistical model to estimate the precision, the correctness, of the translations in the bilingual lexicon. The best result is achieved with an application of the vector space model with a precision of about 80 %. This is a good result and can be compared with the precision of about 60 % found in the literature.

Place, publisher, year, edition, pages
Matematiska institutionen , 2008. , 111 p.
Keyword [en]
Parallel texts, graph similarity, bilingual lexicon, SVD, ARPACK, NMF, OpenMP, text mining
National Category
Computational Mathematics
URN: urn:nbn:se:liu:diva-11550ISRN: LITH-MAT-EX--08/03--SEOAI: diva2:17980
Subject / course
Scientific Computing
Physics, Chemistry, Mathematics
Available from: 2008-05-08 Created: 2008-05-08 Last updated: 2012-04-24Bibliographically approved

Open Access in DiVA

fulltext(2371 kB)928 downloads
File information
File name FULLTEXT01.pdfFile size 2371 kBChecksum MD5
Type fulltextMimetype application/pdf

By organisation
Department of Mathematics
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 928 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 609 hits
ReferencesLink to record
Permanent link

Direct link