liu.seSearch for publications in DiVA
Change search
Refine search result
1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Jonsson, Leif
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Machine Learning-Based Bug Handling in Large-Scale Software Development2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis investigates the possibilities of automating parts of the bug handling process in large-scale software development organizations. The bug handling process is a large part of the mostly manual, and very costly, maintenance of software systems. Automating parts of this time consuming and very laborious process could save large amounts of time and effort wasted on dealing with bug reports. In this thesis we focus on two aspects of the bug handling process, bug assignment and fault localization. Bug assignment is the process of assigning a newly registered bug report to a design team or developer. Fault localization is the process of finding where in a software architecture the fault causing the bug report should be solved. The main reason these tasks are not automated is that they are considered hard to automate, requiring human expertise and creativity. This thesis examines the possi- bility of using machine learning techniques for automating at least parts of these processes. We call these automated techniques Automated Bug Assignment (ABA) and Automatic Fault Localization (AFL), respectively. We treat both of these problems as classification problems. In ABA, the classes are the design teams in the development organization. In AFL, the classes consist of the software components in the software architecture. We focus on a high level fault localization that it is suitable to integrate into the initial support flow of large software development organizations.

    The thesis consists of six papers that investigate different aspects of the AFL and ABA problems. The first two papers are empirical and exploratory in nature, examining the ABA problem using existing machine learning techniques but introducing ensembles into the ABA context. In the first paper we show that, like in many other contexts, ensembles such as the stacked generalizer (or stacking) improves classification accuracy compared to individual classifiers when evaluated using cross fold validation. The second paper thor- oughly explore many aspects such as training set size, age of bug reports and different types of evaluation of the ABA problem in the context of stacking. The second paper also expands upon the first paper in that the number of industry bug reports, roughly 50,000, from two large-scale industry software development contexts. It is still as far as we are aware, the largest study on real industry data on this topic to this date. The third and sixth papers, are theoretical, improving inference in a now classic machine learning tech- nique for topic modeling called Latent Dirichlet Allocation (LDA). We show that, unlike the currently dominating approximate approaches, we can do parallel inference in the LDA model with a mathematically correct algorithm, without sacrificing efficiency or speed. The approaches are evaluated on standard research datasets, measuring various aspects such as sampling efficiency and execution time. Paper four, also theoretical, then builds upon the LDA model and introduces a novel supervised Bayesian classification model that we call DOLDA. The DOLDA model deals with both textual content and, structured numeric, and nominal inputs in the same model. The approach is evaluated on a new data set extracted from IMDb which have the structure of containing both nominal and textual data. The model is evaluated using two approaches. First, by accuracy, using cross fold validation. Second, by comparing the simplicity of the final model with that of other approaches. In paper five we empirically study the performance, in terms of prediction accuracy, of the DOLDA model applied to the AFL problem. The DOLDA model was designed with the AFL problem in mind, since it has the exact structure of a mix of nominal and numeric inputs in combination with unstructured text. We show that our DOLDA model exhibits many nice properties, among others, interpretability, that the research community has iden- tified as missing in current models for AFL.

    List of papers
    1. Towards Automated Anomaly Report Assignment in Large Complex Systems using Stacked Generalization
    Open this publication in new window or tab >>Towards Automated Anomaly Report Assignment in Large Complex Systems using Stacked Generalization
    2012 (English)In: Software Testing, Verification and Validation (ICST), 2012, IEEE , 2012, p. 437-446Conference paper, Published paper (Refereed)
    Abstract [en]

    Maintenance costs can be substantial for organizations with very large and complex software systems. This paper describes research for reducing anomaly report turnaround time which, if successful, would contribute to reducing maintenance costs and at the same time maintaining a good customer perception. Specifically, we are addressing the problem of the manual, laborious, and inaccurate process of assigning anomaly reports to the correct design teams. In large organizations with complex systems this is particularly problematic because the receiver of the anomaly report from customer may not have detailed knowledge of the whole system. As a consequence, anomaly reports may be wrongly routed around in the organization causing delays and unnecessary work. We have developed and validated machine learning approach, based on stacked generalization, to automatically route anomaly reports to the correct design teams in the organization. A research prototype has been implemented and evaluated on roughly one year of real anomaly reports on a large and complex system at Ericsson AB. The prediction accuracy of the automation is approaching that of humans, indicating that the anomaly report handling time could be significantly reduced by using our approach.

    Place, publisher, year, edition, pages
    IEEE, 2012
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:liu:diva-93231 (URN)10.1109/ICST.2012.124 (DOI)978-1-4577-1906-6 (ISBN)
    Conference
    Fifth IEEE International Conference on Software Testing, Verification and Validation (ICST 2012), 17-21 April 2012, Montreal, QC, Canada
    Note

    Finansierat av Ericsson AB

    Available from: 2013-05-27 Created: 2013-05-27 Last updated: 2018-05-17Bibliographically approved
    2. Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts
    Open this publication in new window or tab >>Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts
    Show others...
    2016 (English)In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 21, no 4, p. 1533-1578Article in journal (Refereed) Published
    Abstract [en]

    Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.

    Place, publisher, year, edition, pages
    SPRINGER, 2016
    Keywords
    Machine learning; Ensemble learning; Classification; Bug reports; Bug assignment; Industrial scale; Large scale
    National Category
    Software Engineering
    Identifiers
    urn:nbn:se:liu:diva-130374 (URN)10.1007/s10664-015-9401-9 (DOI)000379060700004 ()
    Note

    Funding Agencies|Industrial Excellence Center EASE Embedded Applications Software Engineering

    Available from: 2016-08-15 Created: 2016-08-05 Last updated: 2018-05-17
    3. Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
    Open this publication in new window or tab >>Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
    2018 (English)In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, no 2, p. 449-463Article in journal (Refereed) Published
    Abstract [en]

    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

    Place, publisher, year, edition, pages
    Taylor & Francis, 2018
    Keywords
    Bayesian inference, Gibbs sampling, Latent Dirichlet Allocation, Massive Data Sets, Parallel Computing, Computational complexity
    National Category
    Probability Theory and Statistics
    Identifiers
    urn:nbn:se:liu:diva-140872 (URN)10.1080/10618600.2017.1366913 (DOI)000435688200018 ()
    Funder
    Swedish Foundation for Strategic Research , SSFRIT 15-0097
    Available from: 2017-09-13 Created: 2017-09-13 Last updated: 2018-07-20Bibliographically approved
    4. Automatic Localization of Bugs to Faulty Components in Large Scale Software Systems using Bayesian Classification
    Open this publication in new window or tab >>Automatic Localization of Bugs to Faulty Components in Large Scale Software Systems using Bayesian Classification
    Show others...
    2016 (English)In: 2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2016), IEEE , 2016, p. 425-432Conference paper, Published paper (Refereed)
    Abstract [en]

    We suggest a Bayesian approach to the problem of reducing bug turnaround time in large software development organizations. Our approach is to use classification to predict where bugs are located in components. This classification is a form of automatic fault localization (AFL) at the component level. The approach only relies on historical bug reports and does not require detailed analysis of source code or detailed test runs. Our approach addresses two problems identified in user studies of AFL tools. The first problem concerns the trust in which the user can put in the results of the tool. The second problem concerns understanding how the results were computed. The proposed model quantifies the uncertainty in its predictions and all estimated model parameters. Additionally, the output of the model explains why a result was suggested. We evaluate the approach on more than 50000 bugs.

    Place, publisher, year, edition, pages
    IEEE, 2016
    Keywords
    Machine Learning; Fault Detection; Fault Location; Software Maintenance; Software Debugging; Software Engineering
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:liu:diva-132879 (URN)10.1109/QRS.2016.54 (DOI)000386751700044 ()978-1-5090-4127-5 (ISBN)
    Conference
    IEEE International Conference on Software Quality, Reliability and Security (QRS)
    Available from: 2016-12-06 Created: 2016-11-30 Last updated: 2018-05-17
  • 2.
    Jonsson, Leif
    et al.
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering. Ericsson AB, Sweden.
    Borg, Markus
    Lund University, Sweden.
    Broman, David
    KTH Royal Institute Technology, Sweden; University of Calif Berkeley, CA 94720 USA.
    Sandahl, Kristian
    Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
    Eldh, Sigrid
    Ericsson AB, Sweden.
    Runeson, Per
    Lund University, Sweden.
    Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts2016In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 21, no 4, p. 1533-1578Article in journal (Refereed)
    Abstract [en]

    Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.

  • 3.
    Magnusson, Måns
    et al.
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Jonsson, Leif
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Villani, Mattias
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.
    Broman, David
    School of Information and Communication Technology, Royal Institute of Technology KTH, Stockholm, Sweden.
    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models2018In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, no 2, p. 449-463Article in journal (Refereed)
    Abstract [en]

    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

  • 4.
    Terenin, Alexander
    et al.
    Imperial Coll London, England.
    Magnusson, Måns
    Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
    Jonsson, Leif
    Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Science & Engineering.
    Draper, David
    Univ Calif Santa Cruz, CA 95064 USA.
    Polya Urn Latent Dirichlet Allocation: A Doubly Sparse Massively Parallel Sampler2019In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 41, no 7, p. 1709-1719Article in journal (Refereed)
    Abstract [en]

    Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either dont converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel Polya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that-contrary to popular belief in the topic modeling literature-partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.

1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf