liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Creating a medical dictionary using word alignment: The influence of sources and resources
Linköping University, The Institute of Technology. Linköping University, Department of Biomedical Engineering, Medical Informatics.ORCID iD: 0000-0001-6468-2432
Linköping University, The Institute of Technology. Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory.
Linköping University, The Institute of Technology. Linköping University, Department of Biomedical Engineering, Medical Informatics.
Linköping University, The Institute of Technology. Linköping University, Department of Biomedical Engineering, Medical Informatics.
2007 (English)In: BMC Medical Informatics and Decision Making, ISSN 1472-6947, E-ISSN 1472-6947, Vol. 7, no 37Article in journal (Refereed) Published
Abstract [en]

Background. Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods. We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results. The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion. More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10. © 2007 Nyström et al, licensee BioMed Central Ltd.

Place, publisher, year, edition, pages
2007. Vol. 7, no 37
National Category
Medical and Health Sciences
Identifiers
URN: urn:nbn:se:liu:diva-40825DOI: 10.1186/1472-6947-7-37Local ID: 54255OAI: oai:DiVA.org:liu-40825DiVA: diva2:261674
Note
Original Publication: Mikael Nyström, Magnus Merkel, Håkan Petersson and Hans Åhlfeldt, Creating a medical dictionary using word alignment: The influence of sources and resources, 2007, BMC Medical Informatics and Decision Making, (7), 37. http://dx.doi.org/10.1186/1472-6947-7-37 Licensee: BioMed Central http://www.biomedcentral.com/ Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2017-12-13
In thesis
1. Enrichment of Terminology Systems for Use and Reuse in Medical Information Systems
Open this publication in new window or tab >>Enrichment of Terminology Systems for Use and Reuse in Medical Information Systems
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Electronic health record systems (EHR) are used to store relevant heath facts about patients. The main use of the EHR is in the care of the patient, but an additional use is to reuse the EHR information to locate and evaluate clinical evidence for treatments. To efficiently use the EHR information it is essential to use appropriate methods for information compilations. This thesis deals with use of information in medical terminology systems and ontologies to be able to better use and reuse EHR information and other medical information.

The first objective of the thesis is to examine if word alignment on bilingual English-Swedish rubrics from five medical terminology systems can be used to build a bilingual dictionary. A study found that it was possible to generate a dictionary with 42 000 entries containing a high proportion of medical entries using word alignment. The method worked best using sets of rubrics with many unique words that are consistently translated. The dictionary can be used as a general medical dictionary, for use in semi-automatic translation methods, for use in cross-language information retrieval systems, and for enrichment of other terminology systems.

The second objective of the thesis is to explore how connections from existing terminology systems and information models to SNOMED CT and the structure in SNOMED CT can be used to reuse information. A study examined whether the primary health care diagnose terminology system KSH97-P can obtain a richer structure using category and chapter mappings from KSH97-P to SNOMED CT and the structure in SNOMED CT. The study showed that KSH97-P can be enriched with a poly-hierarchical chapter division and additional attributes. The richer structure was used to compile statistics in new manners that showed new views of the primary care diagnoses. A literature study evaluated which kinds of information compilations those are necessary to create graphical patient overviews based on information from EHRs. It was found that a third of the patient overviews can have their information needs satisfied using compilations based on SNOMED CT encodings of the information entities in the EHR and the structure in SNOMED CT. The other overviews also need access to individual values in the EHR. This can be achieved by using well-defined information models in the EHR.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. 79 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1335
National Category
Computer and Information Science
Identifiers
urn:nbn:se:liu:diva-58621 (URN)978-91-7393-328-5 (ISBN)
Public defence
2010-09-10, Eken, Campus US, Linköpings universitet, Linköping, 09:00 (English)
Opponent
Supervisors
Available from: 2010-08-30 Created: 2010-08-18 Last updated: 2015-09-22Bibliographically approved

Open Access in DiVA

fulltext(521 kB)448 downloads
File information
File name FULLTEXT01.pdfFile size 521 kBChecksum SHA-512
1208ac484388ff6c64b42188c74ebfd1c94865011c97061d7bcdc07cc8f622fb66fdeb3853f159fc9588c387811910ed615b2b2b705a1d746dbd3e1c261c216d
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Nyström, MikaelMerkel, MagnusPetersson, HåkanÅhlfeldt, Hans

Search in DiVA

By author/editor
Nyström, MikaelMerkel, MagnusPetersson, HåkanÅhlfeldt, Hans
By organisation
The Institute of TechnologyMedical InformaticsNLPLAB - Natural Language Processing Laboratory
In the same journal
BMC Medical Informatics and Decision Making
Medical and Health Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 448 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 602 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf