In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all issue reports to be misclassified, that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We estimate the impact of this misclassification on earlier studies and recommend manual data validation for future studies.
The mozkito-issues module and its data model already provide the necessary model object
EnhancedReport that allows to use such classification result. On demand, we are happy to provide a Mozkito database dump allowing the usage of the manual classified issue reports within the Mozkito framework.
The paper describing the issue of misclassified issue reports and their impact on data mining models is currently under submission for ICSE 2013. A technical report version of the paper can be downloaded using the PDF link below.
title = "It's not a Bug, it's a Feature: How Misclassification Impacts Bug Prediction",
author = "Kim Herzig and Sascha Just and Andreas Zeller",
year = "2012",
month = "August",
institution = "Universität des Saarlandes, Saarbrücken, Germany" }
Download PDF version
For more information, additional results, and the data sets containing the manual classified issue reports please visit the papers website at http://softevo.org/bugclassify/.