liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Bug Classification.: Bug Report Routing
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the growing software technologies companies tend to develop automated solutions to save time and money. Automated solutions have seen tremendous growth in the software industry and have benefited from extensive machine learning research. Although extensive research has been done in the area of automated bug classification, with the new data being collected, more precise methods are yet to be developed. An automated bug classifier will process the content of the bug report and assign it to the person or department that would fix the problem.

A bug report typically contains an unstructured text field where the problem is described in detail. A lot of research regarding information extraction from such text fields has been done. This thesis uses a topic modeling technique, Latent Dirichlet Allocation (LDA), and a numerical statistic Term Frequency - Inverse Document Frequency (TF-IDF), to generate two different features from the unstructured text fields of the bug report. A third set of features was created by concatenating the TF-IDF and the LDA features. The class distribution of the data used in this thesis changes over time. To explore if time has an impact on the prediction, the age of the bug report was introduced as a feature. The importance of this feature, when used along with the LDA and TF-IDF features, was also explored in this thesis.

These generated feature vectors were used as predictors to train three different classification models; multinomial logistic regression, dense neural networks, and DO-probit. The prediction of the classifiers, for the correct department to handle a bug, was evaluated on the accuracy and the F1-score of the prediction. For comparison, the predictions from a Support Vector Machine (SVM) using a linear kernel was treated as the baseline.

The best results for the multinomial logistic regression and the dense neural networks classifiers were obtained when the TF-IDF features of the bug reports were used as predictors. Among the three classifiers trained the dense neural network had the best performance, though the classifier was not able to perform better than the SVM baseline. Using age as a feature did not give a significant improvement in the predictive performance of the classifiers, but was able to identify some interesting patterns in the data. Further research on other ways of using the age of the bug reports could be promising.

Place, publisher, year, edition, pages
2020. , p. 48
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:liu:diva-166224ISRN: LIU-IDA/STAT-A--20/025—SEOAI: oai:DiVA.org:liu-166224DiVA, id: diva2:1437223
External cooperation
Ericsson
Subject / course
Statistics
Supervisors
Examiners
Available from: 2020-09-03 Created: 2020-06-09 Last updated: 2020-09-03Bibliographically approved

Open Access in DiVA

Thesis_sriad858(1390 kB)452 downloads
File information
File name FULLTEXT01.pdfFile size 1390 kBChecksum SHA-512
07e7c60f3f4df694c036d306b8028a8d8a3cc2c1094a34b0317d9c37c940266b59a2c55ac0d8819a2837313b5e3b5242ad92ec9a42ced3bbb80d3c81a054ed82
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Adhikarla, Sridhar
By organisation
Department of Computer and Information ScienceFaculty of Arts and Sciences
Other Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 452 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1429 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf