liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Web Corpus for eCare: Collection, Lay Annotation and Learning - First Results
RISE SICS East Linköping, Sweden.
Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten. RISE SICS East Linköping, Sweden.ORCID-id: 0000-0003-4899-588X
Linköpings universitet, Institutionen för medicinsk teknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0001-6468-2432
Örebro University, Örebro, Sweden.
2017 (Engelska)Ingår i: Position Papers of the 2017 Federated Conference on Computer Science and Information Systems / [ed] M. Ganzha, L. Maciaszek, M. Paprzycki, Polish Information Processing Society , 2017, s. 71-78Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this position paper, we put forward two claims: 1) it is possible to design a dynamic and extensible corpus without running the risk of getting into scalability problems; 2) it is possible to devise noise-resistant Language Technology applications without affecting performance. To support our claims, we describe the design, construction and limitations of a very specialized medical web corpus, called eCare_Sv_01, and we present two experiments on lay-specialized text classification. eCare_Sv_01 is a small corpus of web documents written in Swedish. The corpus contains documents about chronic diseases. The sublanguage used in each document has been labelled as “lay” or “specialized” by a lay annotator. The corpus is designed as a flexible text resource, where additional medical documents will be appended over time. Experiments show that the lay-specialized labels assigned by the lay annotator are reliably learned by standard classifiers. More specifically, Experiment 1 shows that scalability is not an issue when increasing the size of the datasets to be learned from 156 up to 801 documents. Experiment 2 shows that lay-specialized labels can be learned regardless of the large amount of disturbing factors, such as machine translated documents or low-quality texts that are numerous in the corpus

Ort, förlag, år, upplaga, sidor
Polish Information Processing Society , 2017. s. 71-78
Serie
Annals of Computer Science and Information Systems, ISSN 2300-5963
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:liu:diva-141054DOI: 10.15439/2017F531OAI: oai:DiVA.org:liu-141054DiVA, id: diva2:1522891
Konferens
2nd International Workshop on Language Technologies and Applications (LTA'17), Prague, Czech Republic, 3-6 September, 2017
Tillgänglig från: 2017-09-21 Skapad: 2021-01-27 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Person

Jönsson, ArneNyström, Mikael

Sök vidare i DiVA

Av författaren/redaktören
Jönsson, ArneNyström, Mikael
Av organisationen
Interaktiva och kognitiva systemFilosofiska fakultetenInstitutionen för medicinsk teknikTekniska fakulteten
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 57 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf