Change Log

When What
March 26th, 2015 Donated by Dongsun Kim


Studies who have been using the data (in any form) are required to include the following reference:

 author = {Kim, Dongsun and Nam, Jaechang and Song, Jaewoo and Kim, Sunghun},
 title = {Automatic Patch Generation Learned from Human-written Patches},
 booktitle = {Proceedings of the 2013 International Conference on Software Engineering},
 series = {ICSE '13},
 year = {2013},
 isbn = {978-1-4673-3076-3},
 location = {San Francisco, CA, USA},
 pages = {802--811},
 numpages = {10},
 url = {},
 acmid = {2486893},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},

About the Data

Overview of Data

The authors collected 119 real bugs from six open source projects including Mozilla Rhino, EclipseAspectJ, Apache Log4j, and Apache Commons (Math, Lang,Collections) since they are commonly used in the literature and have well-maintained bugreports. The authors searched their corresponding issue trackers for reproducible bugs. Among them, they randomly selected 15 to 29 bugs per project, since some projects had too many bugs. Although they invested their best effort in bug collection, the collected bugs did not represent the entirity of the bugs. 119 was the largest number in automatic patch generation evaluation to date.

Paper abstract

Patch generation is an essential software maintenance task because most software systems inevitably have bugs that need to be fixed. Unfortunately, human resources are often insufficient to fix all reported and known bugs. To address this issue, several automated patch generation techniques have been proposed. In particular, a genetic-programming-based patch generation technique, GenProg, proposed by Weimer et al., has shown promising results. However, these techniques can generate nonsensical patches due to the randomness of their mutation operations. To address this limitation, we propose a novel patch generation approach, Pattern-based Automatic program Repair (PAR), using fix patterns learned from existing human-written patches. We manually inspected more than 60,000 human-written patches and found there are several common fix patterns. Our approach leverages these fix patterns to generate program patches automatically. We experimentally evaluated PAR on 119 real bugs. In addition, a user study involving 89 students and 164 developers confirmed that patches generated by our approach are more acceptable than those generated by GenProg. PAR successfully generated patches for 27 out of 119 bugs, while GenProg was successful for only 16 bugs.