Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition
Independent thesis Basic level (professional degree), 20 credits / 30 HE creditsStudent thesis
In this masters’ thesis report we present a graph theoretical method used for automatic bilingual lexicon acquisition with parallel texts. We analyze the concept of graph similarity and give an interpretation, of the parallel texts, connected to the vector space model. We represent the parallel texts by a directed, tripartite graph and from here use the corresponding adjacency matrix, A, to compute the similarity of the graph. By solving the eigenvalue problem ρS = ASAT + ATSA we obtain the self-similarity matrix S and the Perron root ρ. A rank k approximation of the self-similarity matrix is computed by implementations of the singular value decomposition and the non-negative matrix factorization algorithm GD-CLS. We construct an algorithm in order to extract the bilingual lexicon from the self-similarity matrix and apply a statistical model to estimate the precision, the correctness, of the translations in the bilingual lexicon. The best result is achieved with an application of the vector space model with a precision of about 80 %. This is a good result and can be compared with the precision of about 60 % found in the literature.
Place, publisher, year, edition, pages
Matematiska institutionen , 2008. , 111 p.
Parallel texts, graph similarity, bilingual lexicon, SVD, ARPACK, NMF, OpenMP, text mining
IdentifiersURN: urn:nbn:se:liu:diva-11550ISRN: LITH-MAT-EX--08/03--SEOAI: oai:DiVA.org:liu-11550DiVA: diva2:17980
Subject / course
UppsokPhysics, Chemistry, Mathematics
Eldén, LarsAhrenberg, Lars