liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using machine learning to perform automatic term recognition
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2010 (English)In: Proceedings of the LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods / [ed] Núria Bel, Béatrice Daille, Andrejs Vasiljevs, European Language Resources Association, 2010, p. 49-54Conference paper, Published paper (Refereed)
Abstract [en]

In this paper a machine learning approach is applied to Automatic Term Recognition (ATR). Similar approaches have been successfully used in Automatic Keyword Extraction (AKE). Using a dataset consisting of Swedish patent texts and validated terms belonging to these texts, unigrams and bigrams are extracted and annotated with linguistic and statistical feature values. Experiments using a varying ratio between positive and negative examples in the training data are conducted using the annotated n-grams. The results indicate that a machine learning approach is viable for ATR. Furthermore, a machine learning approach for bilingual ATR is discussed. Preliminary analysis however indicate that some modifications have to be made to apply the monolingual machine learning approach to a bilingual context.

Place, publisher, year, edition, pages
European Language Resources Association, 2010. p. 49-54
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-75237ISI: 000356879501100ISBN: 978-2-9517408-6-0 (print)OAI: oai:DiVA.org:liu-75237DiVA, id: diva2:505121
Conference
LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods, 23 May 2010, Valletta, Malta
Available from: 2012-03-01 Created: 2012-02-22 Last updated: 2018-01-12Bibliographically approved
In thesis
1. Computational Terminology: Exploring Bilingual and Monolingual Term Extraction
Open this publication in new window or tab >>Computational Terminology: Exploring Bilingual and Monolingual Term Extraction
2012 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Terminologies are becoming more important to modern day society as technology and science continue to grow at an accelerating rate in a globalized environment. Agreeing upon which terms should be used to represent which concepts and how those terms should be translated into different languages is important if we wish to be able to communicate with as little confusion and misunderstandings as possible.

Since the 1990s, an increasing amount of terminology research has been devoted to facilitating and augmenting terminology-related tasks by using computers and computational methods. One focus for this research is Automatic Term Extraction (ATE).

In this compilation thesis, studies on both bilingual and monolingual ATE are presented. First, two publications reporting on how bilingual ATE using the align-extract approach can be used to extract patent terms. The result in this case was 181,000 manually validated English-Swedish patent terms which were to be used in a machine translation system for patent documents. A critical component of the method used is the Q-value metric, presented in the third paper, which can be used to rank extracted term candidates (TC) in an order that correlates with TC precision. The use of Machine Learning (ML) in monolingual ATE is the topic of the two final contributions. The first ML-related publication shows that rule induction based ML can be used to generate linguistic term selection patterns, and in the second ML-related publication, contrastive n-gram language models are used in conjunction with SVM ML to improve the precision of term candidates selected using linguistic patterns.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2012. p. 68
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1523
Keywords
terminology, automatic term extraction, automatic term recognition, computational terminology, terminology management
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-75243 (URN)LiU-TEK-LIC-201285 (Local ID)9789175199443 (ISBN)LiU-TEK-LIC-201285 (Archive number)LiU-TEK-LIC-201285 (OAI)
Presentation
2012-04-04, Alan Turing, Hus E, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Available from: 2012-03-07 Created: 2012-02-23 Last updated: 2020-08-27Bibliographically approved

Open Access in DiVA

Foo-2010-Using machine learning to perform automatic term recognition(454 kB)638 downloads
File information
File name FULLTEXT01.pdfFile size 454 kBChecksum SHA-512
ca064afbaa87d13f39cd7434b66345041d6757b477dc2b6ae784d7f8a8f08c495910659234d511e8e1fd5ef92ccf75fdd6cf850d0d23eb75ecb05d78863c7950
Type fulltextMimetype application/pdf

Other links

Link to source

Authority records

Foo, JodyMerkel, Magnus

Search in DiVA

By author/editor
Foo, JodyMerkel, Magnus
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 638 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1004 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf