liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Profiling Domain Specificity of Specialized Web Corpora using Burstiness. Explorations and Open Issues
RISE Research Institutes of Sweden.
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0003-4899-588X
2018 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In this paper we describe an approach to profile the domain specificity of specialized web corpora in Swedish. The proposedapproach is based on burstiness.   Burstiness is a statistical measure that identifies words with uneven distribution across thedocuments of a corpus. We apply burstiness to two medical web corpora that have different size and different domain granularity.Results are promising and show that burstiness is an appropriate measure to profile the domain specificity when matched againstreference lists (gold standards) that represent the target domains. However, further research is needed to find adequate evaluationmetrics, less empirical cut-off points and more principled gold standard design.

Place, publisher, year, edition, pages
2018.
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-154147OAI: oai:DiVA.org:liu-154147DiVA, id: diva2:1283620
Conference
Proceedings of The Seventh Swedish Language Technology Conference 2018 (SLTC-18), Stockholm, Sweden, 7-9 November 2018
Available from: 2019-01-29 Created: 2019-01-29 Last updated: 2019-08-06Bibliographically approved

Open Access in DiVA

fulltext(134 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 134 kBChecksum SHA-512
216ab5814ad4eab1ec7b6b104aa492d573cad8be5489ee1831fc25e82e2f53b7d9eb808e6d65140bdbfd10cf1248a25f84ebda7e7fc3909ee59aeb431f206323
Type fulltextMimetype application/pdf

Authority records BETA

Strandqvist, WiktorJönsson, Arne

Search in DiVA

By author/editor
Strandqvist, WiktorJönsson, Arne
By organisation
Human-Centered systemsFaculty of Science & Engineering
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 5 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf