liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Automatic Classification of Open-Ended Questions: Check-All-That-Apply Questions
Univ Waterloo, Canada.
Western Univ, Canada.
Linköpings universitet, Medicinska fakulteten. Linköpings universitet, Institutionen för hälsa, medicin och vård, Avdelningen för samhälle och hälsa. Region Östergötland, Regionledningskontoret, Enheten för folkhälsa.ORCID-id: 0000-0002-6281-7783
2021 (Engelska)Ingår i: Social science computer review, ISSN 0894-4393, E-ISSN 1552-8286, Vol. 36, nr 4, s. 562-572Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Text data from open-ended questions in surveys are challenging to analyze and are often ignored. Open-ended questions are important though because they do not constrain respondents answers. Where open-ended questions are necessary, often human coders manually code answers. When data sets are large, it is impractical or too costly to manually code all answer texts. Instead, text answers can be converted into numerical variables, and a statistical/machine learning algorithm can be trained on a subset of manually coded data. This statistical model is then used to predict the codes of the remainder. We consider open-ended questions where the answers are coded into multiple labels (all-that-apply questions). For example, in the open-ended question in our Happy example respondents are explicitly told they may list multiple things that make them happy. Algorithms for multilabel data take into account the correlation among the answer codes and may therefore give better prediction results. For example, when giving examples of civil disobedience, respondents talking about "minor nonviolent offenses" were also likely to talk about "crimes." We compare the performance of two different multilabel algorithms (random k-labelsets [RAKEL], classifier chains [CC]) to the default method of binary relevance (BR) which applies single-label algorithms to each code separately. Performance is evaluated on data from three open-ended questions (Happy, Civil Disobedience, and Immigrant). We found weak bivariate label correlations in the Happy data (90th percentile: 7.6%), and stronger bivariate label correlations in the Civil Disobedience (90th percentile: 17.2%) and Immigrant (90th percentile: 19.2%) data. For the data with stronger correlations, we found both multilabel methods performed substantially better than BR using 0/1 loss ("at least one label is incorrect") and had little effect when using Hamming loss (average error). For data with weak label correlations, we found no difference in performance between multilabel methods and BR. We conclude that automatic classification of open-ended questions that allow multiple answers may benefit from using multilabel algorithms for 0/1 loss. The degree of correlations among the labels may be a useful prognostic tool.

Ort, förlag, år, upplaga, sidor
Sage Publications, 2021. Vol. 36, nr 4, s. 562-572
Nyckelord [en]
open-ended questions; multilabel; check-all-that-apply; machine learning; statistical learning; text
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Identifikatorer
URN: urn:nbn:se:liu:diva-160426DOI: 10.1177/0894439319869210ISI: 000483411100001OAI: oai:DiVA.org:liu-160426DiVA, id: diva2:1353428
Anmärkning

Funding Agencies|Social Sciences and Humanities Research Council of Canada (SSHRC) [435-2013-0128]

Tillgänglig från: 2019-09-23 Skapad: 2019-09-23 Senast uppdaterad: 2022-04-26

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Person

Wenemark, Marika

Sök vidare i DiVA

Av författaren/redaktören
Wenemark, Marika
Av organisationen
Medicinska fakultetenAvdelningen för samhälle och hälsaEnheten för folkhälsa
I samma tidskrift
Social science computer review
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 218 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf