URL

Change Log

When	What
January 13th, 2016	Donated by Rudolf Ferenc

Reference

Studies who have been using the data (in any form) are required to include the following reference:

@inproceedings{KHFG16_RefactDataSet,
  author = {István Kádár and Péter Hegedűs and Rudolf Ferenc and Tibor Gyimóthy},
  booktitle = {Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering},
  pages = {599 - 603},
  title = "{A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability}",
  publisher = "IEEE Computer Society",
  year = 2016,
  location = "Osaka, Japan",
}

About the Data

Overview of Data

The dataset contains two types of data for each class and method in a system: static source code metrics and number of refactorings (grouped by types) affecting the particular item between consecutive versions

Attribute Information

code refactoring, software maintainability, empirical study

Paper Abstract

It is very common in various fields that there is a gap between theoretical results and their practical applications. This is true for code refactoring as well, which has a solid theoretical background while being used in development practice at the same time. However, more and more studies suggest that developers perform code refactoring entirely differently than the theory would suggest. Our paper encourages the further investigation of code refactorings in practice by providing an excessive public dataset of source code metrics and applied refactorings through several releases of 7 open-source systems. As a first step of processing this dataset, we examined the quality attributes of the refactored source code classes and the values of source code metrics improved by those refactorings. Our early results show that lower maintainability indeed triggers more code refactorings in practice and these refactorings significantly decrease complexity, code lines, coupling and clone metrics. However, we observed a degradation in comment related metrics in the refactored code.

Data Categories

RefactDataSet