Towards a Corpus of Easy to Read Authority Web Texts
2016 (English)In: , 2016Conference paper, Poster (Other academic)
We present the first version of a corpus of public authorities and municipality web texts, as of spring 2016, divided into easy-to-read texts and texts written in Standard Swedish. The corpus currently contains documents totalling approximately 30 milliontokens. In this paper we describe the tools and methods used to collect the web pages and data of the corpus.
Place, publisher, year, edition, pages
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:liu:diva-132627OAI: oai:DiVA.org:liu-132627DiVA: diva2:1047402
The Sixth Swedish Language Technology Conference (SLTC) Umeå University, Umeå, Sweden, November 17-18, 2016
FunderVINNOVA.SE (The Internet Infrastructure Foundation)