Change Log

When What
February 6, 2016 Updated BibTeX, paper link, data link
May 21, 2010 Donated by Sousuke Amasaki

Notes from the Author


  • Miyazaki Y.
  • Terakado M.
  • Ozaki K.
  • Nozaki H.

Number of records


Number of attributes

9: (1 identifier, 7 condition attributes, 1 decision attribute)

Attribute Information

  • ID: Project ID
  • KSLOC: the number of COBOL source lines in thousands excluding comment lines, and screen and form definition codes. Lines of code copied by the COPY statement are also excluded, but lines registered as COPY phrases are included.
  • SCRN: number of different input or output sceens
  • FORM: number of different (report) forms
  • FILE: number of different record formats
  • ESCRN: total number of data elements in all the screens
  • EFORM: total number of data elements in all the forms
  • EFILE: total number of data elements in all the files
  • MM: Man-Months form system design to systems test including indirect effort such as project management. one MM is defined as 160 hours of working time.

Missing attributes



Robust regression for developing software estimation models

 author = {Miyazaki, Y. and Terakado, M. and Ozaki, K. and Nozaki, H.},
 title = {Robust Regression for Developing Software Estimation Models},
 journal = {J. Syst. Softw.},
 issue_date = {Oct. 1994},
 volume = {27},
 number = {1},
 month = oct,
 year = {1994},
 issn = {0164-1212},
 pages = {3--16},
 numpages = {14},
 url = {http://dx.doi.org/10.1016/0164-1212(94)90110-4},
 doi = {10.1016/0164-1212(94)90110-4},
 acmid = {198684},
 publisher = {Elsevier Science Inc.},
 address = {New York, NY, USA},

Paper Abstract

To develop a good software estimation model fitted to actual data, the evaluation criteria of goodness of fit is necessary. The first major problem discussed here is that ordinary relative error used for this criterion is not suitable because it has a bound in the case of under-estimation and no bound in the case of overestimation. We propose use of a new relative error called balanced relative error as the basis for the criterion and introduce seven evaluation criteria for software estimation models. The second major problem is that the ordinary least-squares method used for calculation of parameter values of a software estimation model is neither consistent with the criteria nor robust enough, which means that the solution is easily distorted by outliers. We propose a new consistent and robust method called the least-squares of inverted balanced relative errors (LIRS) and demonstrates its superiority to the ordinary least-squares method by use of five actual data sets. Through the analysis of these five data sets with LIRS, we show the importance of consistent data collection and development standarization to develop a good software sizing model. We compare goodness of fit between the sizing model based on the number of screens, forms, and files, and the sizing model based on the number of data elements for each of them. Based on this comparison, the validity of the number of data elements as independent variables for a sizing model is examined. Moreover, the validity of increasing the number of independent variables is examined.