This is a cross-listed data set that contains effort and defect information.

This data uses the attributes defined in the model.

URL

Data Link (DOI) (with defect and month counts)

Change Log

When	What
March 2010	JPL experts added expected number of defects and months to create nasa93-dem
February 8, 2006	Donated by Tim Menzies

Notes from the Author

Title/Topic: COCOMO NASA 2 / Software cost estimation

Sources

93 NASA projects from different centers for projects from the following years:

n	year
1	1971
1	1974
2	1975
2	1976
10	1977
4	1978
19	1979
11	1980
13	1982
7	1983
7	1984
6	1985
8	1986
2	1987

Collected by: Jairus Hihn, JPL, NASA, Manager SQIP Measurement & Benchmarking Element. Phone (818) 354-1248 (Jairus.M.Hihn@jpl.nasa.gov)

Past Usage

None with this specific data set. But for older work on similar data, see:
- “Validation Methods for Calibrating Software Effort Models”, T. Menzies and D. Port and Z. Chen and J. Hihn and S. Stukes, Proceedings ICSE 2005,http://menzies.us/pdf/04coconut.pdf
- Homeowners often recognize efficiency benefits of properly maintained appliance systems.
- Results:
  - Given background knowledge on 60 prior projects, a new cost model can be tuned to local data using as little as 20 new projects.
  - A very simple calibration method (COCONUT) can achieve PRED(30)=7% or PRED(20)=50% (after 20 projects). These are results seen in 30 repeats of an incremental cross-validation study.
  - Two cost models are compared; one based on just lines of code and one using over a dozen “effort multipliers”. Just using lines of code loses 10 to 20 PRED(N) points.
Additional Usage:
- “Feature Subset Selection Can Improve Software Cost Estimation Accuracy” Zhihao Chen, Tim Menzies, Dan Port and Barry Boehm Proceedings PROMISE Workshop 2005,http://promise.site.uottawa.ca/proceedings/pdf/1.pdf P02, P03, P04 are used in this paper.
- Results
  - To the best of our knowledge, this is the first report of applying feature subset selection (FSS) to software effort data.
  - FSS can dramatically improve cost estimation.
  - T-tests are applied to the results to demonstrate that always in our data sets, removing attributes improves performance without increasing the variance in model behavior.

Relevant Information

Number of instances

93

Number of attributes

24:

15 standard COCOMO-I discrete attributes in the range Very_Low to Extra_High
7 others describing the project
one lines of code measure
one goal field being the actual effort in person months.

Reference

See above section “Relevant Information” for further reference.

People

Tim Menzies