McCabe and Halstead features were defined in the 70s in an attempt to objectively characterize code features that are associated with software quality. The McCabe and Halstead measures are “module”-based where a “module” is the smallest unit of functionality. In C or Smalltalk, “modules” would be called “function” or “method” respectively.
For notes on the specifics of these metrics, see here and here.
Defect detectors can be assessed according to the following measures:
For notes on variables a, b, c, d that are used above, see the following table:
module actually has defects
+-------------+------------+
| no | yes |
+-----+-------------+------------+
classifier predicts no defects | no | a | b |
+-----+-------------+------------+
classifier predicts some defects | yes | c | d |
+-----+-------------+------------+
Ideally, detectors have high PDs, low PFs, and low effort. This ideal state rarely happens:
pd
1 | x x x KEY:
| x . "." denotes the line PD=PF
| x . "x" denotes the roc curve
| x . for a set of detectors
| x .
| x .
| x .
|x .
|x
x------------------ pf
0 1
Note that:
McCabe argued that code with complicated pathways are more error-prone. His metrics therefore reflect the pathways within a code module. Bibtex reference for further notes is given below:
@Article{mccabe76,
title = "A Complexity Measure",
author = "T.J. McCabe",
pages = "308--320",
journal = "IEEE Transactions on Software Engineering",
year = "1976",
volume = "2",
month = "December",
number = "4"}
Halstead argued that code that is hard to read is more likely to be fault prone. Halstead estimates reading complexity by counting the number of concepts in a module; e.g. number of unique operators. Further notes on Halstead features can be found in:
@Book{halstead77,
Author = "M.H. Halstead",
Title = "Elements of Software Science",
Publisher = "Elsevier ",
Year = 1977}
McCabe and Halstead static code measures are convenient to use since they are useful, easy to use, and widely used:
Nevertheless, the merits of these metrics has been widely criticized. Static code measures are hardly a complete characterization of the internals of a function. Fenton offers an insightful example where the same functionality is achieved using different programming language constructs resulting in different static measurements for that module. Fenton uses this example to argue the uselessness of static code measures:
@book{fenton97,
author = "N.E. Fenton and S.L. Pfleeger",
title = {Software metrics: a Rigorous \& Practical Approach},
publisher = {International Thompson Press},
year = {1997}}
An alternative interpretation of Fenton’s example is that static measures can never be a definite and certain indicator of the presence of a fault. Rather, defect detectors based on static measures are best viewed as probabilistic statements that the frequency of faults tends to increase in code modules that trigger the detector. By definition, such probabilistic statements are not categorical claims with some a non-zero false alarm rate. The research challenge for data miners is to ensure that these false alarms do not cripple their learned theories.