liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Design and Implementation of a Name Matching Algorithm for Persian Language
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, The Institute of Technology.
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Persian language. The purpose of this thesis is to present a name matching algorithm for Persian. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly well for Persian names. This proposed algorithm, called Persian Edit Distance Algorithm or shortly PEDA, was built based on the characteristics of the Persian language and it compares Persian names with each other on three levels: phonetic similarity, character form similarity and keyboard distance, in order to give more accurate results for Persian names. The algorithm gets Persian names as its input and determines their similarity as a percentage in the output. In this thesis three series of experiments have been accomplished in order to evaluate the proposed algorithm. The f-measure average shows a value of 0.86 for the first series and a value of 0.80 for the second series results. The first series of experiments have been repeated with Levenshtein as well, and have 33.9% false negatives on average while PEDA has a false negative average of 6.4%. The third series of experiments shows that PEDA works well for one edit, two edits and three edits with true positive average values of 99%, 81%, and 69% respectively. 

Place, publisher, year, edition, pages
2013. , 63 p.
Keyword [en]
Name matching Persian language string matching
National Category
Computer Science
URN: urn:nbn:se:liu:diva-102210ISRN: LIU-IDA/LITH-EX-A--13/061--SEOAI: diva2:675478
Subject / course
Computer and information science at the Institute of Technology
Educational program
Special Education Programme
Available from: 2013-12-05 Created: 2013-12-03 Last updated: 2013-12-05Bibliographically approved

Open Access in DiVA

leimo-thesis(2379 kB)510 downloads
File information
File name FULLTEXT01.pdfFile size 2379 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Momeninasab, Leila
By organisation
Human-Centered systemsThe Institute of Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 510 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 143 hits
ReferencesLink to record
Permanent link

Direct link