Qualitas Corpus


Change Log

When What
November 30th, 2016 Donated by Rahul Krishna


Studies who have been using the data (in any form) are required to include the following reference:

  author = {Tempero, Ewan and Anslow, Craig and Dietrich, Jens and Han, Ted and Li, Jing and Lumpe, Markus and Melton, Hayden and Noble, James},
  title = {Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies},
  booktitle = {2010 Asia Pacific Software Engineering Conference (APSEC2010)},
  pages = {336--345},
  month = dec,
  year = {2010},
  doi = {http://dx.doi.org/10.1109/APSEC.2010.46}

About the Data

Overview of Data

Although the original Qualitas Corpus has provided a valuable contribution for experimentation in software engineering, there are several scenarios—e.g., experiments that rely on Abstract Syntax Tree (AST) or bytecode—in which researchers need to import and compile the source code. Since this task is not trivial in the case of systems with many external dependencies, our goal is to assist researchers by removing the compilation effort when conducting empirical studies.


The Qualitas Corpus is a curated collection of software systems intended to be used for empirical studies of code artefacts. The primary goal is to provide a resource that supports reproducible studies of software. The current release of the Corpus contains open-source Java software systems, often multiple versions.