Definition Extraction From Swedish Technical Documentation: Bridging the gap between industry and academy approaches
Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.
Place, publisher, year, edition, pages
2016. , 30 p.
definition extraction, machine learning, pattern matching, naive bayes, regular expressions, rev, classifier, terminology, comparison
definitionsextraktion, maskininlärning, mönstermatchning, reguljära uttryck, rev, klassificerare, terminologi, jämförelse
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:liu:diva-131057ISRN: LIU-IDA/KOGVET-G--16/024--SEOAI: oai:DiVA.org:liu-131057DiVA: diva2:968018
Fodina Language Technology
Subject / course