Notes on the Data Set

This is a defect data set with defect reports as well as the sequences after data pre-processing:

  • A tutorial on defect prediction can be found here.

Change Log

When What
February 2, 2016 adding BibTeX, adding Abstract, updating dead links
August 28, 2012 Donated by Ning Chen


author={Ning Chen and Hoi, S.C.H. and Xiaokui Xiao},
booktitle={Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on},
title={Software process evaluation: A machine learning approach},
keywords={learning (artificial intelligence);pattern classification;software management;software process improvement;software quality;authority constraint;defect management process;machine learning approach;manual qualitative evaluation;real industrial software project;semiautomated approach;sequence classification task;software development;software process evaluation;software products quality;Capability maturity model;Data mining;Machine learning;Machine learning algorithms;Organizations;Software;Standards organizations;defect management process;machine learning;sequence classification;software process},

Paper Abstract

Software process evaluation is essential to improve software development and the quality of software products in an organization. Conventional approaches based on manual qualitative evaluations (e.g., artifacts inspection) are deficient in the sense that (i) they are time-consuming, (ii) they suffer from the authority constraints, and (iii) they are often subjective. To overcome these limitations, this paper presents a novel semi-automated approach to software process evaluation using machine learning techniques. In particular, we formulate the problem as a sequence classification task, which is solved by applying machine learning algorithms. Based on the framework, we define a new quantitative indicator to objectively evaluate the quality and performance of a software process. To validate the efficacy of our approach, we apply it to evaluate the defect management process performed in four real industrial software projects. Our empirical results show that our approach is effective and promising in providing an objective and quantitative measurement for software process evaluation.