liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools
Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
2010 (Engelska)Ingår i: Terminology in Everyday Life / [ed] Marcel Thelen and Frieda Steurs, John Benjamins Publishing Company , 2010, s. 163-180Kapitel i bok, del av antologi (Övrigt vetenskapligt)
Abstract [en]

Using a standardized term bank in both authoring and translation processes can facilitate the use of consistent terminology, which in turn minimizes confusion and frustration from the readers. One of the problems of creating a standardized term bank, is the time and effort required. Recent developments in term extraction techniques based on word alignment can improve extraction of term candidates when parallel texts are available. The aligned units are processed automatically, but a large quantity of term candidates will still have to be processed by a terminologist to select which candidates should be promoted to standardized terms. To minimize the work needed to process the extracted term candidates, we propose a method based on using efficient editing tools, as well as ranking the extracted set of term candidates by quality. This sorted set of term candidates can then be edited, categorized and filtered in a more effective way. In this paper, the process and methods used to arrive at a standardized term bank are presented and discussed.

 

Ort, förlag, år, upplaga, sidor
John Benjamins Publishing Company , 2010. s. 163-180
Serie
Terminology and Lexicography Research and Practice, ISSN 1388-8455 ; 13
Nyckelord [en]
terminology, extraction, term bank, automation
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling) Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:liu:diva-59842ISBN: 978 90 272 2337 1 (tryckt)OAI: oai:DiVA.org:liu-59842DiVA, id: diva2:353517
Tillgänglig från: 2010-09-27 Skapad: 2010-09-27 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. Computational Terminology: Exploring Bilingual and Monolingual Term Extraction
Öppna denna publikation i ny flik eller fönster >>Computational Terminology: Exploring Bilingual and Monolingual Term Extraction
2012 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Terminologies are becoming more important to modern day society as technology and science continue to grow at an accelerating rate in a globalized environment. Agreeing upon which terms should be used to represent which concepts and how those terms should be translated into different languages is important if we wish to be able to communicate with as little confusion and misunderstandings as possible.

Since the 1990s, an increasing amount of terminology research has been devoted to facilitating and augmenting terminology-related tasks by using computers and computational methods. One focus for this research is Automatic Term Extraction (ATE).

In this compilation thesis, studies on both bilingual and monolingual ATE are presented. First, two publications reporting on how bilingual ATE using the align-extract approach can be used to extract patent terms. The result in this case was 181,000 manually validated English-Swedish patent terms which were to be used in a machine translation system for patent documents. A critical component of the method used is the Q-value metric, presented in the third paper, which can be used to rank extracted term candidates (TC) in an order that correlates with TC precision. The use of Machine Learning (ML) in monolingual ATE is the topic of the two final contributions. The first ML-related publication shows that rule induction based ML can be used to generate linguistic term selection patterns, and in the second ML-related publication, contrastive n-gram language models are used in conjunction with SVM ML to improve the precision of term candidates selected using linguistic patterns.

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2012. s. 68
Serie
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1523
Nyckelord
terminology, automatic term extraction, automatic term recognition, computational terminology, terminology management
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:liu:diva-75243 (URN)LiU-TEK-LIC-201285 (Lokalt ID)9789175199443 (ISBN)LiU-TEK-LIC-201285 (Arkivnummer)LiU-TEK-LIC-201285 (OAI)
Presentation
2012-04-04, Alan Turing, Hus E, Campus Valla, Linköpings universitet, Linköping, 13:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2012-03-07 Skapad: 2012-02-23 Senast uppdaterad: 2020-08-27Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

link to publisherfind book at a swedish library/hitta boken i ett svenskt bibliotek

Person

Foo, JodyMerkel, Magnus

Sök vidare i DiVA

Av författaren/redaktören
Foo, JodyMerkel, Magnus
Av organisationen
NLPLAB - Laboratoriet för databehandling av naturligt språkTekniska högskolan
Språkteknologi (språkvetenskaplig databehandling)Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 301 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf