When | What |
---|---|
February 6, 2016 | Updated BibTeX, paper link, data link |
May 21, 2010 | Donated by Sousuke Amasaki |
48
9: (1 identifier, 7 condition attributes, 1 decision attribute)
None
Robust regression for developing software estimation models
@article{Miyazaki:1994:RRD:198682.198684,
author = {Miyazaki, Y. and Terakado, M. and Ozaki, K. and Nozaki, H.},
title = {Robust Regression for Developing Software Estimation Models},
journal = {J. Syst. Softw.},
issue_date = {Oct. 1994},
volume = {27},
number = {1},
month = oct,
year = {1994},
issn = {0164-1212},
pages = {3--16},
numpages = {14},
url = {http://dx.doi.org/10.1016/0164-1212(94)90110-4},
doi = {10.1016/0164-1212(94)90110-4},
acmid = {198684},
publisher = {Elsevier Science Inc.},
address = {New York, NY, USA},
}
To develop a good software estimation model fitted to actual data, the evaluation criteria of goodness of fit is necessary. The first major problem discussed here is that ordinary relative error used for this criterion is not suitable because it has a bound in the case of under-estimation and no bound in the case of overestimation. We propose use of a new relative error called balanced relative error as the basis for the criterion and introduce seven evaluation criteria for software estimation models. The second major problem is that the ordinary least-squares method used for calculation of parameter values of a software estimation model is neither consistent with the criteria nor robust enough, which means that the solution is easily distorted by outliers. We propose a new consistent and robust method called the least-squares of inverted balanced relative errors (LIRS) and demonstrates its superiority to the ordinary least-squares method by use of five actual data sets. Through the analysis of these five data sets with LIRS, we show the importance of consistent data collection and development standarization to develop a good software sizing model. We compare goodness of fit between the sizing model based on the number of screens, forms, and files, and the sizing model based on the number of data elements for each of them. Based on this comparison, the validity of the number of data elements as independent variables for a sizing model is examined. Moreover, the validity of increasing the number of independent variables is examined.