PullReq-CI

URL

Change Log

When What
October 30th, 2015 Donated by Bogdan Vasilescu

Reference

Studies who have been using the data (in any form) are required to include the following reference:

@inproceedings{Vasilescu:2015:QPO:2786805.2786850,
 author = {Vasilescu, Bogdan and Yu, Yue and Wang, Huaimin and Devanbu, Premkumar and Filkov, Vladimir},
 title = {Quality and Productivity Outcomes Relating to Continuous Integration in GitHub},
 booktitle = {Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering},
 series = {ESEC/FSE 2015},
 year = {2015},
 isbn = {978-1-4503-3675-8},
 location = {Bergamo, Italy},
 pages = {805--816},
 numpages = {12},
 url = {http://doi.acm.org/10.1145/2786805.2786850},
 doi = {10.1145/2786805.2786850},
 acmid = {2786850},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Continuous integration, GitHub, pull requests},
}

About the Data

Overview of Data

Software processes comprise many steps; coding is followed by building, integration testing, system testing, deployment, operations, among others. Software process integration and automation have been areas of key concern in software engineering, ever since the pioneering work of Osterweil; market pressures for Agility, and open, decentralized, software development have provided additional pressures for progress in this area. But do these innovations actually help projects? Given the numerous confounding factors that can influence project performance, it can be a challenge to discern the effects of process integration and automation. Software project ecosystems such as GitHub provide a new opportunity in this regard: one can readily find large numbers of projects in various stages of process integration and automation, and gather data on various influencing factors as well as productivity and quality outcomes. In this paper we use large, historical data on process metrics and outcomes in GitHub projects to discern the e↵ects of one specific innovation in process automation: continuous integration. Our main finding is that continuous integration improves the productivity of project teams, who can integrate more outside contributions, without an observable diminishment in code quality.

Paper Abstract

Software processes comprise many steps; coding is followed by building, integration testing, system testing, deployment, operations, among others. Software process integration and automation have been areas of key concern in software engineering, ever since the pioneering work of Osterweil; market pressures for Agility, and open, decentralized, software development have provided additional pressures for progress in this area. But do these innovations actually help projects? Given the numerous confounding factors that can influence project performance, it can be a challenge to discern the effects of process integration and automation. Software project ecosystems such as GitHub provide a new opportunity in this regard: one can readily find large numbers of projects in various stages of process integration and automation, and gather data on various influencing factors as well as productivity and quality outcomes. In this paper we use large, historical data on process metrics and outcomes in GitHub projects to discern the effects of one specific innovation in process automation: continuous integration. Our main finding is that continuous integration improves the productivity of project teams, who can integrate more outside contributions, without an observable diminishment in code quality.