Maven Dependency Dataset

URL

Change Log

When What
September 18th, 2015 Donated by Joost Visser

Reference

Studies who have been using the data (in any form) are required to include the following reference:

@INPROCEEDINGS{6624037,
author={Raemaekers, S. and Nane, G.F. and Van Deursen, A. and Visser, J.},
booktitle={Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on},
title={Testing principles, current practices, and effects of change localization},
year={2013},
pages={257-266},
keywords={Java;data encapsulation;program testing;software libraries;Java libraries;change localization;design principle;encapsulation practices;less stable interfaces;library interfaces;ripple effects;software development;testing principles;Correlation;Encapsulation;Java;Libraries;Measurement;Software;Stability analysis;Software libraries;encapsulation;ripple effects},
doi={10.1109/MSR.2013.6624037},
ISSN={2160-1852},
month={May},}

About the Data

Overview of Data

The Maven Dependency Dataset contains the data as described in the paper “Mining Metrics, Changes and Dependencies from the Maven Dependency Dataset”. NOTE: See the README.TXT file for more information on the data in this dataset. The dataset consists of multiple parts: A snapshot of the Maven repository dated July 30, 2011 (maven.tar.gz), a MySQL database (complete.tar.gz) containing information on individual methods, classes and packages of different library versions, a Berkeley DB database (berkeley.tar.gz) containing metrics on all methods, classes and packages in the repository, a Neo4j graph database (graphdb.tar.gz) containing a call graph of the entire repository, scripts and analysis files (scriptsAndData.tar.gz), Source code and a binary package of the analysis software (fullmaven.jar and fullmaven-sources.jar), and text dumps of data in these databases (graphdump.tar.gz, processed.tar.gz, calls.tar.gz and units.tar.gz).

Attribute Information

Software libraries, encapsulation, ripple effect

Paper Abstract

Best practices in software development state that code that is likely to change should be encapsulated to localize possible modifications. In this paper, we investigate the appli- cation and effects of this design principle. We investigate the relationship between the stability, encapsulation and popularity of libraries on a dataset of 148,253 Java libraries. We find that bigger systems with more rework in existing methods have less stable interfaces and that bigger systems tend to encapsulate dependencies better. Additionally, there are a number of factors that are associated with change in library interfaces, such as rework in existing methods, system size, encapsulation of dependencies and the number of dependencies. We find that current encapsulation practices are not targeted at libraries that change the most. We also investigate the strength of ripple effects caused by instability of dependencies and we find that libraries cause ripple effects in systems using them and that these effects can be mitigated by encapsulation.