liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Hansdotter, Frida I.
    et al.
    Public Health Agency Sweden, Sweden.
    Magnusson, Måns
    Linköping University, Department of Computer and Information Science, Statistics. Linköping University, Faculty of Science & Engineering.
    Kuhlmann-Berenzon, Sharon
    Public Health Agency Sweden, Sweden.
    Hulth, Anette
    Public Health Agency Sweden, Sweden.
    Sundstrom, Kristian
    Lund University, Sweden.
    Hedlund, Kjell-Olof
    Swedish Institute Communicable Disease Control, Linkoping, Sweden.
    Andersson, Yvonne
    Swedish Institute Communicable Disease Control, Linkoping, Sweden.
    The incidence of acute gastrointestinal illness in Sweden2015In: Scandinavian Journal of Public Health, ISSN 1403-4948, E-ISSN 1651-1905, Vol. 43, no 5, p. 540-547Article in journal (Refereed)
    Abstract [en]

    Aims: The aim of this study was to estimate the self-reported domestic incidence of acute gastrointestinal illness in the Swedish population irrespective of route of transmission or type of pathogen causing the disease. Previous studies in Sweden have primarily focused on incidence of acute gastrointestinal illness related to consumption of contaminated food and drinking water. Methods: In May 2009, we sent a questionnaire to 4000 randomly selected persons aged 0-85 years, asking about the number of episodes of stomach disease during the last 12 months. To validate the data on symptoms, we compared the study results with anonymous queries submitted to a Swedish medical website. Results: The response rate was 64%. We estimated that a total number of 2744,778 acute gastrointestinal illness episodes (95% confidence intervals 2475,641-3013,915) occurred between 1 May 2008 and 30 April 2009. Comparing the number of reported episodes with web queries indicated that the low number of episodes during the first 6 months was an effect of seasonality rather than recall bias. Further, the result of the recall bias analysis suggested that the survey captured approximately 65% of the true number of episodes among the respondents. Conclusions: The estimated number of Swedish acute gastrointestinal illness cases in this study is about five times higher than previous estimates. This study provides valuable information on the incidence of gastrointestinal symptoms in Sweden, irrespective of route of transmission, indicating a high burden of acute gastrointestinal illness, especially among children, and large societal costs, primarily due to production losses.

    Download full text (pdf)
    fulltext
  • 2. Order onlineBuy this publication >>
    Magnusson, Måns
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Scalable and Efficient Probabilistic Topic Model Inference for Textual Data2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Probabilistic topic models have proven to be an extremely versatile class of mixed-membership models for discovering the thematic structure of text collections. There are many possible applications, covering a broad range of areas of study: technology, natural science, social science and the humanities.

    In this thesis, a new efficient parallel Markov Chain Monte Carlo inference algorithm is proposed for Bayesian inference in large topic models. The proposed methods scale well with the corpus size and can be used for other probabilistic topic models and other natural language processing applications. The proposed methods are fast, efficient, scalable, and will converge to the true posterior distribution.

    In addition, in this thesis a supervised topic model for high-dimensional text classification is also proposed, with emphasis on interpretable document prediction using the horseshoe shrinkage prior in supervised topic models.

    Finally, we develop a model and inference algorithm that can model agenda and framing of political speeches over time with a priori defined topics. We apply the approach to analyze the evolution of immigration discourse in the Swedish parliament by combining theory from political science and communication science with a probabilistic topic model.

    List of papers
    1. Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
    Open this publication in new window or tab >>Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
    2018 (English)In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, no 2, p. 449-463Article in journal (Refereed) Published
    Abstract [en]

    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

    Place, publisher, year, edition, pages
    Taylor & Francis, 2018
    Keywords
    Bayesian inference, Gibbs sampling, Latent Dirichlet Allocation, Massive Data Sets, Parallel Computing, Computational complexity
    National Category
    Probability Theory and Statistics
    Identifiers
    urn:nbn:se:liu:diva-140872 (URN)10.1080/10618600.2017.1366913 (DOI)000435688200018 ()
    Funder
    Swedish Foundation for Strategic Research , SSFRIT 15-0097
    Available from: 2017-09-13 Created: 2017-09-13 Last updated: 2022-04-11Bibliographically approved
    2. Automatic Localization of Bugs to Faulty Components in Large Scale Software Systems using Bayesian Classification
    Open this publication in new window or tab >>Automatic Localization of Bugs to Faulty Components in Large Scale Software Systems using Bayesian Classification
    Show others...
    2016 (English)In: 2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2016), IEEE , 2016, p. 425-432Conference paper, Published paper (Refereed)
    Abstract [en]

    We suggest a Bayesian approach to the problem of reducing bug turnaround time in large software development organizations. Our approach is to use classification to predict where bugs are located in components. This classification is a form of automatic fault localization (AFL) at the component level. The approach only relies on historical bug reports and does not require detailed analysis of source code or detailed test runs. Our approach addresses two problems identified in user studies of AFL tools. The first problem concerns the trust in which the user can put in the results of the tool. The second problem concerns understanding how the results were computed. The proposed model quantifies the uncertainty in its predictions and all estimated model parameters. Additionally, the output of the model explains why a result was suggested. We evaluate the approach on more than 50000 bugs.

    Place, publisher, year, edition, pages
    IEEE, 2016
    Keywords
    Machine Learning; Fault Detection; Fault Location; Software Maintenance; Software Debugging; Software Engineering
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:liu:diva-132879 (URN)10.1109/QRS.2016.54 (DOI)000386751700044 ()978-1-5090-4127-5 (ISBN)
    Conference
    IEEE International Conference on Software Quality, Reliability and Security (QRS)
    Available from: 2016-12-06 Created: 2016-11-30 Last updated: 2020-09-16
    3. Pulling Out the Stops: Rethinking Stopword Removal for Topic Models
    Open this publication in new window or tab >>Pulling Out the Stops: Rethinking Stopword Removal for Topic Models
    2017 (English)In: 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, volume 2: Short Papers, Stroudsburg: Association for Computational Linguistics (ACL) , 2017, Vol. 2, p. 432-436Conference paper, Published paper (Other academic)
    Abstract [en]

    It is often assumed that topic models benefit from the use of a manually curated stopword list. Constructing this list is time-consuming and often subject to user judgments about what kinds of words are important to the model and the application. Although stopword removal clearly affects which word types appear as most probable terms in topics, we argue that this improvement is superficial, and that topic inference benefits little from the practice of removing stopwords beyond very frequent terms. Removing corpus-specific stopwords after model inference is more transparent and produces similar results to removing those words prior to inference.

    Place, publisher, year, edition, pages
    Stroudsburg: Association for Computational Linguistics (ACL), 2017
    National Category
    Probability Theory and Statistics General Language Studies and Linguistics Specific Languages
    Identifiers
    urn:nbn:se:liu:diva-147612 (URN)9781945626357 (ISBN)
    Conference
    15th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of Conference, volume 2: Short Papers April 3-7, 2017, Valencia, Spain
    Available from: 2018-04-27 Created: 2018-04-27 Last updated: 2018-04-27Bibliographically approved
    Download full text (pdf)
    Scalable and Efficient Probabilistic Topic Model Inference for Textual Data
    Download (pdf)
    omslag
    Download (png)
    presentationsbild
  • 3.
    Magnusson, Måns
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences. Aalto University, Espoo, Finland.
    Jonsson, Leif
    Ericsson AB, Stockholm, Sweden.
    Villani, Mattias
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences. Stockholm University, Stockholm, Sweden.
    DOLDA: a regularized supervised topic model for high-dimensional multi-class regression2020In: Computational statistics (Zeitschrift), ISSN 0943-4062, E-ISSN 1613-9658, Vol. 35, no 1, p. 175-201Article in journal (Refereed)
    Abstract [en]

    Generating user interpretable multi-class predictions in data-rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant probit model (Johndrow et al., in: Proceedings of the sixteenth international conference on artificial intelligence and statistics, 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al. in Biometrika 97:465–480, 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model’s predictive accuracy and scalability, and demonstrate DOLDA’s advantage in interpreting the generated predictions.

    Download full text (pdf)
    fulltext
  • 4.
    Magnusson, Måns
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Jonsson, Leif
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering. Ericsson Res, Sweden.
    Villani, Mattias
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Broman, David
    School of Information and Communication Technology, Royal Institute of Technology KTH, Stockholm, Sweden.
    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models2018In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, no 2, p. 449-463Article in journal (Refereed)
    Abstract [en]

    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

  • 5.
    Schofield, Alexandra
    et al.
    Cornell University Ithaca, NY, USA.
    Magnusson, Måns
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Mimno, David
    Cornell University Ithaca, NY, USA.
    Pulling Out the Stops: Rethinking Stopword Removal for Topic Models2017In: 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, volume 2: Short Papers, Stroudsburg: Association for Computational Linguistics (ACL) , 2017, Vol. 2, p. 432-436Conference paper (Other academic)
    Abstract [en]

    It is often assumed that topic models benefit from the use of a manually curated stopword list. Constructing this list is time-consuming and often subject to user judgments about what kinds of words are important to the model and the application. Although stopword removal clearly affects which word types appear as most probable terms in topics, we argue that this improvement is superficial, and that topic inference benefits little from the practice of removing stopwords beyond very frequent terms. Removing corpus-specific stopwords after model inference is more transparent and produces similar results to removing those words prior to inference.

1 - 5 of 5
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf