liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Question Classification in Question Answering Systems
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2007 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications.

This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst.

The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain.

Place, publisher, year, edition, pages
Institutionen för datavetenskap , 2007. , 77 p.
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1320
Keyword [en]
Question classification, question answering, machine learning, taxonomy, evaluation
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:liu:diva-9014ISBN: 978-91-85831-55-5OAI: diva2:23705
2007-06-18, Alan Turing, Hus E, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Report code: LiU-Tek-Lic-2007:29.Available from: 2007-05-29 Created: 2007-05-29 Last updated: 2014-01-13

Open Access in DiVA

fulltext(447 kB)7744 downloads
File information
File name FULLTEXT01.pdfFile size 447 kBChecksum MD5
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sundblad, Håkan
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 7744 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 915 hits
ReferencesLink to record
Permanent link

Direct link