taskdict

URL

Change Log

When What
March 13, 2015 Donated by Katja Kevic

Reference

Studies who have been using the data (in any form) are required to add the following reference to their report/paper:

@inproceedings{Kevic:2014:DTC:2597073.2597095,
   author = {Kevic, Katja and Fritz, Thomas},
   title = {A Dictionary to Translate Change Tasks to Source Code},
   booktitle = {Proceedings of the 11th Working Conference on Mining Software Repositories},
   series = {MSR 2014},
   year = {2014},
   isbn = {978-1-4503-2863-0},
   location = {Hyderabad, India},
   pages = {320--323},
   numpages = {4},
   url = {http://doi.acm.org/10.1145/2597073.2597095},
   doi = {10.1145/2597073.2597095},
   acmid = {2597095},
   publisher = {ACM},
   address = {New York, NY, USA},
   keywords = {Dictionary, change task, interaction history, location},
}

About the Data

For the empirical analysis, the authors created two datasets: dataset1 includes all change tasks that have at least one change set associated with them; dataset 2 is a subset of dataset 1 and includes all change tasks that have at least one change set and one task context associated. For the AspectJ project the authors only created dataset 1, since only two tasks had a task context associated. For each project the authors split each dataset into a training set to create the dictionary and a test set. For reasonable training and test sets and to be able to compare them across projects, they chose the test set size to be ten. Also, to be able to compare the results of the dictionaries using dierent in-formation sources (change sets and/or task contexts) as well as preserving the chronological order, they chose the last ten change tasks in dataset 2 as the test set for Mylyn.Tasks,Mylyn. Context and Remus and the last ten change tasks in dataset 1 for AspectJ. All change tasks that were resolved before the ones in the test set were used as the training set.

Paper Abstract

At the beginning of a change task, software developers spend a substantial amount of their time searching and navigating to locate relevant parts in the source code. Current approaches to support developers in this initial code search predominantly use information retrieval techniques that leverage the similarity between task descriptions and the identifiers of code elements to recommend relevant elements. However, the vocabulary or language used in source code often differs from the one used for describing change tasks, especially since the people developing the code are not the same as the ones reporting bugs or defining new features to be implemented. In our work, we investigate the creation of a dictionary that maps the different vocabularies using information from change sets and interaction histories stored with previously completed tasks. In an empirical analysis on four open source projects, our approach substantially improved upon the results of traditional information retrieval techniques for recommending relevant code elements.