liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DOLDA: a regularized supervised topic model for high-dimensional multi-class regression
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences. Aalto University, Espoo, Finland.
Ericsson AB, Stockholm, Sweden.
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences. Stockholm University, Stockholm, Sweden.
2020 (English)In: Computational statistics (Zeitschrift), ISSN 0943-4062, E-ISSN 1613-9658, Vol. 35, no 1, p. 175-201Article in journal (Refereed) Published
Abstract [en]

Generating user interpretable multi-class predictions in data-rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant probit model (Johndrow et al., in: Proceedings of the sixteenth international conference on artificial intelligence and statistics, 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al. in Biometrika 97:465–480, 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy and scalability, and demonstrate DOLDA’s advantage in interpreting the generated predictions.

Place, publisher, year, edition, pages
Springer, 2020. Vol. 35, no 1, p. 175-201
Keywords [en]
Text classification, Latent Dirichlet Allocation, Horseshoe prior, Diagonal Orthant probit model, Interpretable models
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:liu:diva-159217DOI: 10.1007/s00180-019-00891-1ISI: 000516561400012Scopus ID: 2-s2.0-85067414496OAI: oai:DiVA.org:liu-159217DiVA, id: diva2:1340533
Note

Funding agencies: Aalto University

Available from: 2019-08-05 Created: 2019-08-05 Last updated: 2020-03-19Bibliographically approved

Open Access in DiVA

fulltext(1158 kB)262 downloads
File information
File name FULLTEXT01.pdfFile size 1158 kBChecksum SHA-512
84eb60b070b1b1cd1c2a263550882d2fae3129affcd5c0e8bac9c1c0e6119f16d74ab725fc8d4077ca49b4ef2314934029e0a20b08311ee301ca939dd56c7734
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Magnusson, MånsVillani, Mattias

Search in DiVA

By author/editor
Magnusson, MånsVillani, Mattias
By organisation
The Division of Statistics and Machine LearningFaculty of Arts and Sciences
In the same journal
Computational statistics (Zeitschrift)
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 262 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 239 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf