xfree

URL

Change Log

When What
February 6, 2016 Updated BibTeX, datalinks
March 31, 2005 Donated by Bart Massey

Notes from the Author

This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering.

  1. Title: XFree86 Repository Transaction Data

  2. Sources: (a) Original creators of database: Bart Massey 01 503 725-5393 Computer Science Dept. PO Box 751 MS CMPS Portland State University Portland, OR USA 97207-0751 bart@cs.pdx.edu

(b) Donor of database: owner

(c) Date received: 31 March 2005

  1. Past Usage: none

  2. Relevant Information:

This dataset was assembled by analyzing the publicly-available CVS archives of the XFree86 Project http://xfree86.org)) using a modified version of CVSAnalY (https://github.com/MetricsGrimoire/CVSAnalY) It is intended for a wide variety of uses, and thus no dependent variable is specified. See Massey’s PROMISE 2005 paper, “Longitudinal Analysis of Long-Timescale Open Source Repository Data”, for further information. Note that on 487 occasions, a negative lines-added count was recorded by CVSAnalY, apparently due to a bug. These entries should probably be repaired or at least filtered out.

  1. Number of Instances: 175658

  2. Number of Attributes: 10 (incl. 2 inferred)

  3. Attribute information:

    • Inferred filetype:
      • non-numeric—nominal
      • 9 values (documentation,images,i18n,ui,multimedia,code,build,devel-doc,unknown)
    • File Pathname: UNIX relative path name
      • non-numeric—structured
      • 1935 directories, 27009 files
    • Revision: dotted-decimal revision string
      • non-numeric—structured
      • 7182 unique revision numbers
    • Author ID: integer identifier
      • non-numeric—nominal
      • 21 unique authors
    • Lines added: integer
      • numeric—integer
      • MIN: 0 MAX: 699789 MEAN: 124.274 STDEV: 2450.66
      • MIN: 0 MAX: 70554 MEAN: 41.3577 STDEV: 407.737
      • (excl. images, multimedia)
      • Note: on 487 occasions, a negative lines-added count was recorded by CVSAnalY, apparently due to a bug. In the summary statistics reported in this comment, these additions are not counted.
    • Lines removed: integer
      • numeric—integer
      • MIN: 0 MAX: 123751 MEAN: 27.8503 STDEV: 624.008
      • MIN: 0 MAX: 55664 MEAN: 17.3073 STDEV: 294.047
      • (excl. images, multimedia)
    • File has since been removed (“in Attic”): boolean (0, 1)
      • non-numeric—nominal
      • 40.67% positive
    • Commit has CVS_SILENT flag: boolean (0, 1)
      • non-numeric—nominal
      • never set
    • Inferred that committer was not author: boolean (0, 1)
      • non-numeric—nominal
      • 1.04% positive
    • Commit date: date
      • MIN: “1994-04-27 07:07:10” MAX: “2005-02-19 02:22:07”
      • Note: CVS has trouble with timezones; we assume all dates are UTC.
  4. Missing Attribute Values:

  • unknown inferred filetype (attr 1): 18719
  1. Class Distribution: N/A

Reference

This is one of the datasets contributed by Bart Massey.
Longitudinal analysis of long-timescale open source repository data

@inproceedings{Massey:2005:LAL:1083165.1083167,
 author = {Massey, Bart},
 title = {Longitudinal Analysis of Long-timescale Open Source Repository Data},
 booktitle = {Proceedings of the 2005 Workshop on Predictor Models in Software Engineering},
 series = {PROMISE '05},
 year = {2005},
 isbn = {-159593-125-2},
 location = {St. Louis, Missouri},
 pages = {1--5},
 numpages = {5},
 url = {http://doi.acm.org/10.1145/1082983.1083167},
 doi = {10.1145/1082983.1083167},
 acmid = {1083167},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {open source, source repositories},
}