liu.seSearch for publications in DiVA
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Web Corpus for eCare: Collection, Lay Annotation and Learning - First Results
RISE SICS East Linköping, Sweden.
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences. RISE SICS East Linköping, Sweden.ORCID iD: 0000-0003-4899-588X
Linköping University, Department of Biomedical Engineering. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0001-6468-2432
Örebro University, Örebro, Sweden.
2017 (English)In: Position Papers of the 2017 Federated Conference on Computer Science and Information Systems / [ed] M. Ganzha, L. Maciaszek, M. Paprzycki, Polish Information Processing Society , 2017, p. 71-78Conference paper, Published paper (Refereed)
Abstract [en]

In this position paper, we put forward two claims: 1) it is possible to design a dynamic and extensible corpus without running the risk of getting into scalability problems; 2) it is possible to devise noise-resistant Language Technology applications without affecting performance. To support our claims, we describe the design, construction and limitations of a very specialized medical web corpus, called eCare_Sv_01, and we present two experiments on lay-specialized text classification. eCare_Sv_01 is a small corpus of web documents written in Swedish. The corpus contains documents about chronic diseases. The sublanguage used in each document has been labelled as “lay” or “specialized” by a lay annotator. The corpus is designed as a flexible text resource, where additional medical documents will be appended over time. Experiments show that the lay-specialized labels assigned by the lay annotator are reliably learned by standard classifiers. More specifically, Experiment 1 shows that scalability is not an issue when increasing the size of the datasets to be learned from 156 up to 801 documents. Experiment 2 shows that lay-specialized labels can be learned regardless of the large amount of disturbing factors, such as machine translated documents or low-quality texts that are numerous in the corpus

Place, publisher, year, edition, pages
Polish Information Processing Society , 2017. p. 71-78
Series
Annals of Computer Science and Information Systems, ISSN 2300-5963
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-141054DOI: 10.15439/2017F531OAI: oai:DiVA.org:liu-141054DiVA, id: diva2:1522891
Conference
2nd International Workshop on Language Technologies and Applications (LTA'17), Prague, Czech Republic, 3-6 September, 2017
Available from: 2017-09-21 Created: 2021-01-27 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Jönsson, ArneNyström, Mikael

Search in DiVA

By author/editor
Jönsson, ArneNyström, Mikael
By organisation
Human-Centered systemsFaculty of Arts and SciencesDepartment of Biomedical EngineeringFaculty of Science & Engineering
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 23 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf