liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Probabilistic Tagging Module Based on Surface Pattern Matching
Stockholm University.ORCID iD: 0000-0003-3734-0757
1994 (English)In: NODALIDA ’93 – Proceedings of ‘9:e Nordiska Datalingvistikdagarna’,Stockholm 3–5 June 1993 / [ed] Robert Eklund, Stockholm: Stockholm University, 1994, p. 83-95Conference paper, Published paper (Refereed)
Abstract [en]

This paper treats automatic, probabilistic tagging. First, residual, untagged, output from the lexical analyser SWETWOL2 is described and discussed. A method of tagging residual output is proposed and implemented: the left-stripping method. This algorithm, employed by the module ENDTAG, recursively strips a word of its leftmost letter, and looks up the remaining ‘ending’ in a dictionary. If the ending is found, ENDTAG tags it according to the information found in the dictionary. If the ending is not found in the dictionary, a match is searched in ending lexica containing statistical information about word classes associated with the ending and the relative frequency of each word class. If a match is found in the ending lexica, the word is given graded tagging according to the statistical information in the ending lexica. If no match is found, the ending is stripped of what is now its left-most letter and is recursively searched in dictionary and ending lexica (in that order). The ending lexica – containing the statistical information – employed in this paper are obtained from a reversed version of Nusvensk Frekvensordbok (Allén 1970), and contain endings of one to seven letters. Success rates for ENDTAG as a standalone module are presented.

Place, publisher, year, edition, pages
Stockholm: Stockholm University, 1994. p. 83-95
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:liu:diva-135280ISBN: 9171532625 (print)OAI: oai:DiVA.org:liu-135280DiVA, id: diva2:1080379
Conference
NODALIDA ’93 - Proceedings of 9:e Nordiska Datalingvistikdagarna, Stockholm University, Sweden, 3–5 June 1993
Available from: 2017-03-10 Created: 2017-03-10 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

A Probabilistic Tagging Module Based on Surface Pattern Matching(309 kB)19 downloads
File information
File name FULLTEXT02.pdfFile size 309 kBChecksum SHA-512
cc4b5ac647a38764f7f042c7c040eedc725a88c7f0125ebf3ba17b243117907557794d0baab00ccbb7776ae88b5b35527c9101065df4d89c2ab74ba60081bf8c
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Eklund, Robert
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 34 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf