liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Classification of Open-Ended Questions: Check-All-That-Apply Questions
Univ Waterloo, Canada.
Western Univ, Canada.
Linköping University, Department of Medical and Health Sciences, Division of Community Medicine. Linköping University, Faculty of Medicine and Health Sciences.ORCID iD: 0000-0002-6281-7783
2019 (English)In: Social science computer review, ISSN 0894-4393, E-ISSN 1552-8286, article id UNSP 0894439319869210Article in journal (Refereed) Epub ahead of print
Abstract [en]

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about "minor nonviolent offenses" were also likely to talk about "crimes." We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss ("at least one label is incorrect") and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.

Place, publisher, year, edition, pages
SAGE PUBLICATIONS INC , 2019. article id UNSP 0894439319869210
Keywords [en]
open-ended questions; multilabel; check-all-that-apply; machine learning; statistical learning; text
National Category
Information Systems
Identifiers
URN: urn:nbn:se:liu:diva-160426DOI: 10.1177/0894439319869210ISI: 000483411100001OAI: oai:DiVA.org:liu-160426DiVA, id: diva2:1353428
Note

Funding Agencies|Social Sciences and Humanities Research Council of Canada (SSHRC) [435-2013-0128]

Available from: 2019-09-23 Created: 2019-09-23 Last updated: 2019-09-23

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Wenemark, Marika
By organisation
Division of Community MedicineFaculty of Medicine and Health Sciences
In the same journal
Social science computer review
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf