xorg

URL

Change Log

When What
February 6, 2016 Updated BibTeX, datalinks
March 31, 2005 Donated by Bart Massey

Notes from the Author

This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering.

  1. Title: X.org Repository Transaction Data

  2. Sources: (a) Original creators of database: Bart Massey 01 503 725-5393 Computer Science Dept. PO Box 751 MS CMPS Portland State University Portland, OR USA 97207-0751 bart@cs.pdx.edu

(b) Donor of database: owner

(c) Date received: 31 March 2005

  1. Past Usage: none

  2. Relevant Information:

This dataset was assembled by analyzing the publicly-available CVS archives of X.org (http://x.org) available on freedesktop.org (http://freedesktop.org) using a modified version of CVSAnalY (http://metricsgrimoire.github.io/CVSAnalY.) It is intended for a wide variety of uses, and thus no dependent variable is specified. See Massey’s PROMISE 2005 paper, “Longitudinal Analysis of Long-Timescale Open Source Repository Data”, for further information. Note that on 668 occasions, a negative lines-added count was recorded by CVSAnalY, apparently due to a bug. These entries should probably be repaired or at least filtered out.

  1. Number of Instances: 136435

  2. Number of Attributes: 10 (incl. 2 inferred)

  3. Attribute information:

    • Inferred filetype:
      • non-numeric—nominal
      • 9 values (documentation,images,i18n,ui,multimedia,code,build,devel-doc,unknown)
    • File Pathname: UNIX relative path name
      • non-numeric—structured
      • 1567 directories, 20382 files
    • Revision: dotted-decimal revision string
      • non-numeric—structured
      • 1740 unique revision numbers
    • Author ID: integer identifier
      • non-numeric—nominal
      • 42 unique authors
    • Lines added: integer
      • numeric—integer
      • MIN: 0 MAX: 132642 MEAN: 81.9752 STDEV: 1221.55
      • MIN: 0 MAX: 26150 MEAN: 38.7436 STDEV: 354.985
      • (excl. images, multimedia)
      • Note: on 668 occasions, a negative lines-added count was recorded by CVSAnalY, apparently due to a bug. In the summary statistics reported in this comment, these additions are not counted.
    • Lines removed: integer
      • numeric—integer
      • MIN: 0 MAX: 11765 MEAN: 16.0445 STDEV: 143.589
      • MIN: 0 MAX: 11765 MEAN: 13.9409 STDEV: 128.19
      • (excl. images, multimedia)
    • File has since been removed (“in Attic”): boolean (0, 1)
      • non-numeric—nominal
      • 21.19% positive
    • Commit has CVS_SILENT flag: boolean (0, 1)
      • non-numeric—nominal
      • never set
    • Inferred that committer was not author: boolean (0, 1)
      • non-numeric—nominal
      • 0.46% positive
    • Commit date: date
      • MIN: “2003-11-14 15:54:29” MAX: “2005-02-19 16:00:13”
      • Note: CVS has trouble with timezones; we assume all dates are UTC.
  4. Missing Attribute Values:

  • unknown inferred filetype (attr 1): 17280
  1. Class Distribution: N/A

Reference

This is one of the datasets contributed by Bart Massey.
Longitudinal analysis of long-timescale open source repository data

@inproceedings{Massey:2005:LAL:1083165.1083167,
 author = {Massey, Bart},
 title = {Longitudinal Analysis of Long-timescale Open Source Repository Data},
 booktitle = {Proceedings of the 2005 Workshop on Predictor Models in Software Engineering},
 series = {PROMISE '05},
 year = {2005},
 isbn = {-159593-125-2},
 location = {St. Louis, Missouri},
 pages = {1--5},
 numpages = {5},
 url = {http://doi.acm.org/10.1145/1082983.1083167},
 doi = {10.1145/1082983.1083167},
 acmid = {1083167},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {open source, source repositories},
}