Change Log

When What
November 15th, 2015 Donated by A. Gunes Koru


Studies who have been using the data (in any form) are required to include the following reference:

author={Nan Niu and Mahmoud, A.},
booktitle={Requirements Engineering Conference (RE), 2012 20th IEEE International},
title={Enhancing candidate link generation for requirements tracing: The cluster hypothesis revisited},
keywords={formal verification;information retrieval;program diagnostics;public domain software;baseline pruning strategy;candidate link generation;cluster hypothesis;correct links;false positives;incorrect links;information retrieval methods;low-quality clusters;open-source datasets;requirements tracing process;requirements tracing tools;Algorithm design and analysis;Clustering algorithms;Context;Gold;Humans;Software;Software algorithms;clustering;requirements tracing;traceability},

About the Data

Overview of Data

The data is a weka .arff file. It contains 94 independent variables and 1 dependent variable.

Paper Abstract

Modern requirements tracing tools employ information retrieval methods to automatically generate candidate links. Due to the inherent trade-off between recall and precision, such methods cannot achieve a high coverage without also retrieving a great number of false positives, causing a significant drop in result accuracy. In this paper, we propose an approach to improving the quality of candidate link generation for the requirements tracing process. We base our research on the cluster hypothesis which suggests that correct and incorrect links can be grouped in high-quality and low-quality clusters respectively. Result accuracy can thus be enhanced by identifying and filtering out low-quality clusters. We describe our approach by investigating three open-source datasets, and further evaluate our work through an industrial study. The results show that our approach outperforms a baseline pruning strategy and that improvements are still possible.