liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering. Linköping University. (STIMA)
Number of Authors: 4
2017 (English)In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715Article in journal (Refereed) Accepted
Place, publisher, year, edition, pages
Taylor & Francis, 2017.
Keyword [en]
Bayesian inference, Gibbs sampling, Latent Dirichlet Allocation, Massive Data Sets, Parallel Computing, Computational complexity
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:liu:diva-140872DOI: 10.1080/10618600.2017.1366913OAI: oai:DiVA.org:liu-140872DiVA: diva2:1141079
Funder
Swedish Foundation for Strategic Research , SSFRIT 15-0097
Note

Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

Available from: 2017-09-13 Created: 2017-09-13 Last updated: 2017-09-19Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Villani, Mattias
By organisation
The Division of Statistics and Machine LearningFaculty of Science & Engineering
In the same journal
Journal of Computational And Graphical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf