liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Towards a Corpus of Easy to Read Authority Web Texts
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. SICS East Swedish ICT AB, Linköping, Sweden. (NLPLAB)
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences. SICS East Swedish ICT AB, Linköping, Sweden.ORCID iD: 0000-0003-4899-588X
2016 (English)In: , 2016Conference paper, Poster (Other academic)
Abstract [en]

We present the first version of a corpus of public authorities and municipality web texts, as of spring 2016, divided into easy-to-read texts and texts written in Standard Swedish. The corpus currently contains documents totalling approximately 30 milliontokens. In this paper we describe the tools and methods used to collect the web pages and data of the corpus.

Place, publisher, year, edition, pages
2016.
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-132627OAI: oai:DiVA.org:liu-132627DiVA: diva2:1047402
Conference
The Sixth Swedish Language Technology Conference (SLTC) Umeå University, Umeå, Sweden, November 17-18, 2016
Funder
VINNOVA.SE (The Internet Infrastructure Foundation)
Available from: 2016-11-17 Created: 2016-11-17 Last updated: 2016-11-23Bibliographically approved

Open Access in DiVA

No full text

Other links

Link to extended abstract

Search in DiVA

By author/editor
Rennes, EvelinaJönsson, Arne
By organisation
Human-Centered systemsFaculty of Science & EngineeringFaculty of Arts and Sciences
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

Total: 9 hits
ReferencesLink to record
Permanent link

Direct link