liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
A Lexicon for Gene Normalization
Linköping University, Department of Computer and Information Science.
2009 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Ett lexicon för gennormalisering (Swedish)
Abstract [en]

Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall.

Place, publisher, year, edition, pages
2009. , 38 p.
Keyword [en]
Bioinformatics, Gene Normalization, String Matching, Text Mining
National Category
Bioinformatics and Systems Biology
URN: urn:nbn:se:liu:diva-20250ISRN: LIU-IDA/LITH-EX-A--09/038OAI: diva2:234084
2009-08-21, al-Khwarizmi, Linköpings Universitet, Linköping, 15:15 (English)
Available from: 2009-09-07 Created: 2009-08-31 Last updated: 2009-09-07Bibliographically approved

Open Access in DiVA

fulltext(325 kB)231 downloads
File information
File name FULLTEXT01.pdfFile size 325 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Lingemark, Maria
By organisation
Department of Computer and Information Science
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 231 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 187 hits
ReferencesLink to record
Permanent link

Direct link