Change Log

When What
April 9th, 2015 Donated by Takafumi Fukushima


Studies who have been using the data (in any form) are required to include the following reference:

  title={An empirical study of just-in-time defect prediction using cross-project models},
  author={Fukushima, Takafumi and Kamei, Yasutaka and McIntosh, Shane and Yamashita, Kazuhiro and Ubayashi, Naoyasu},
  booktitle={Proceedings of the 11th Working Conference on Mining Software Repositories},

About the Data

Overview of Data

JDT: eclipse-cvs-jdt.tgz (1.10G, retrieved on 2006-11-25)

SWT: eclipse-cvs-swt.tgz (488M, retrieved on 2006-11-25)

Rest: eclipse-cvs-rest.tgz (1.82G, retrieved on 2006-11-25)

Eclipse Bugzilla export (in XML):

Bugs: 1-162656: eclipse-bugs.zip (125M, retrieved on 2006-10-30)

Firefox/Mozilla CVS repository: mozilla-cvs.tgz (609M, retrieved on 2006-12-17)

Firefox/Mozilla Bugzilla export (in XML):

Bugs 35-100000: mozilla-bugs-000001-100000.zip (886M, retrieved on 2007-01-06)

Bugs 100001-200000: mozilla-bugs-100001-200000.zip (1.10G, retrieved on 2007-01-06)

Bugs 200001-300000: mozilla-bugs-200001-300000.zip (1.00G, retrieved on 2007-01-06)

Bugs 300001-366112: mozilla-bugs-300001-367500.zip (724M, retrieved on 2007-01-06)

Eclipse CVS repository in TARE format:

eclipse-tare.tgz (247M, retrieved by Tom Zimmermann on 2006-11-25)

Eclipse Bug Data (more details):

eclipse-bug-data-1.1a.zip (920K, retrieved by Adrian Schröter)

Eclipse CVS repository:

JDT: eclipse-cvs-jdt.tgz (1.1G, retrieved on 2007-12-19 by Thomas Zimmermann)

SWT: eclipse-cvs-swt.tgz (686M, retrieved on 2007-12-19 by Thomas Zimmermann)

Platform: eclipse-cvs-plarform.tgz (686M, retrieved on 2007-12-19 by Thomas Zimmermann)

Rest: eclipse-cvs-rest.tgz (1.7G, retrieved on 2007-12-19 by Thomas Zimmermann)

Eclipse Bugzilla export (in XML):

Bugs: 1-213000: eclipse-bugs–000001-213000.zip (3.2G, retrieved on 2007-12-19 by Thomas Zimmermann)

Paper Abstract

Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with su!cient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.