QUEST datasets


Change Log

When What
October 30th, 2015 Donated by Laura Moreno


Studies who have been using the data (in any form) are required to include the following reference:

 author = {Moreno, Laura and Bavota, Gabriele and Haiduc, Sonia and Di Penta, Massimiliano and Oliveto, Rocco and Russo, Barbara and Marcus, Andrian},
 title = {Query-based Configuration of Text Retrieval Solutions for Software Engineering Tasks},
 booktitle = {Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering},
 series = {ESEC/FSE 2015},
 year = {2015},
 isbn = {978-1-4503-3675-8},
 location = {Bergamo, Italy},
 pages = {567--578},
 numpages = {12},
 url = {},
 doi = {10.1145/2786805.2786859},
 acmid = {2786859},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Configuration, Feature and Bug Localization, Text-Retrieval in Software Engineering},

About the Data

Paper Abstract

Text Retrieval (TR) approaches have been used to leverage the textual information contained in software artifacts to address a multitude of software engineering (SE) tasks. However, TR approaches need to be configured properly in order to lead to good results. Current approaches for automatic TR configuration in SE configure a single TR approach and then use it for all possible queries. In this paper, we show that such a configuration strategy leads to suboptimal results, and propose QUEST, the first approach bringing TR configuration selection to the query level. QUEST recommends the best TR configuration for a given query, based on a supervised learning approach that determines the TR configuration that performs the best for each query according to its properties. We evaluated QUEST in the context of feature and bug localization, using a data set with more than 1,000 queries. We found that QUEST is able to recommend one of the top three TR configurations for a query with a 69% accuracy, on average. We compared the results obtained with the configurations recommended by QUEST for every query with those obtained using a single TR configuration for all queries in a system and in the entire data set. We found that using QUEST we obtain better results than with any of the considered TR configurations.