liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models
Imperial Coll London, England.
Uppsala Univ, Sweden; Aalto Univ, Finland.
Linköping University. Ericsson AB, Sweden.
2020 (English)In: PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020, p. 2925-2934Conference paper, Published paper (Refereed)
Abstract [en]

To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language-an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020. p. 2925-2934
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:liu:diva-209705ISI: 000855160703009ISBN: 9781952148606 (print)OAI: oai:DiVA.org:liu-209705DiVA, id: diva2:1914100
Conference
Conference on Empirical Methods in Natural Language Processing (EMNLP), ELECTR NETWORK, nov 16-20, 2020
Note

Funding Agencies|Academy of Finland [298742, 313122]; Swedish Research Council [201805170, 201806063]; Ericsson AB

Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2025-02-07

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Jonsson, Leif
By organisation
Linköping University
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf