liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Classifying easy-to-read texts without parsing
Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.ORCID iD: 0000-0002-6357-4461
Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.ORCID iD: 0000-0003-4899-588X
2014 (English)In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), 2014, 114-122 p.Conference paper (Refereed)
Abstract [en]

Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classi- fier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.

Place, publisher, year, edition, pages
2014. 114-122 p.
Keyword [en]
Readability, Readability Assessment, Genetic optimization, Machine Learning, Support Vector Machine
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-117547ISBN: 978-1-937284-91-6OAI: oai:DiVA.org:liu-117547DiVA: diva2:809460
Conference
14th Conference of the European Chapter of the Association for Computational Linguistics
Available from: 2015-05-04 Created: 2015-05-04 Last updated: 2016-08-22Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Falkenjack, JohanJönsson, Arne
By organisation
Department of Computer and Information ScienceThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 35 hits
ReferencesLink to record
Permanent link

Direct link