Change Log

When What
April 1, 2012 Donated by Audris Mockus
April 1, 2012 AudrisMockus am1 ported to repo v4


Audris Mockus. Software support tools and experimental work. In Empirical Software Engineering Issues: Critical Assessments and Future Directions, pages 91-99. Springer LNCS, 2007


There a common type of dataset generated by virtually any software project. Better understanding of software projects can lead to actual improvements on how software is created, presenting the state and history of projects can improve project management by preventing the lest rational, but, unfortunately, too common management decisions.

Data description contains three parts: # Files and formats # Explanation of individual fields # Domain background

The data has a number of features and dependencies that are too numerous to describe.

Files and formats

mrs header and records all semicolon separated
mrs.xml data in the ggobi XML format
mrs.txt header and records all tab separated. Character strings were converted to integers (see mrs.xml for the mapping between integers and strings they represent.) Some stat/vis packages prefer only numeric data, so this file was created to facilitate analysis.

Please note that MRs that are submitted to several software releases appear several times in the table below, with all attributes repeated except for the release name.

Individual fields

  • MR :
    • Modification request - a tag used to track a software work item
  • orgID :
    • login of the person originating this MR (could be a customer representative reporting a problem, tester, developer). Please note that all the logins are mapped to logins from a different project to preserve confidentiality of the original project participants.
  • crID :
    • login of the person creating this MR.
  • devID :
    • login of the person working on this MR The person who investigates and makes necessary changes (if any) to complete this MR.
  • origDate, crDate, submtDate, apprDate :
    • Dates (expressed as seconds since 00:00, Jan 1, 1970) of when MR was originated, created, submitted by developer, and, finally, approved.
  • Release :
    • The software release (a big thing, like win 95, win98, win200)
  • Build :
    • The software build (builds occur about weekly, as you will discover).
  • System :
    • The name of the software module (slightly edited for confidentiality)
  • Severity :
    • 1 - critical, 2 - high, 3 - medium, 4 - low
  • Status :
    • The current status of the MR as of 24/09/2003.
  • !PhaseDetected :
    • when in the development process this MR was discovered.
  • Category :
    • It should tell us if the MR is a new feature or a fix, but instead it is a hodgepodge of MR category, phase detected, and othervalues that appeared appropriate to different developers at different times. It has the following values: inspection, maintenance, discussion, document_reading, code_reading field_enh, usage, new_feature, review, testing, field_mod
  • Class :
    • is it software, hardware, or documentation MR
  • !BugNew :
    • purpose - b: bug fix, n: new feature, c - cleanup, 0 - not clear
  • nFiles :
    • number of files modified by this MR
  • nDelta :
    • number of delta (atomic modification of a file) associated by this MR
  • linesAdded :
    • number of lines added by this MR
  • linesDeleted :
  • number of lines deleted by this MR


Problem tracking (PR) and version control systems (VCS) are used by virtually all software projects to coordinate the work of the project participants and to allow parallel work on several releases and patches. This dataset is a typical example of the data that is usually available from VSC and PR systems.

A slightly simplified version of an MR process follows. The developers are assigned a new feature or a defect to work on. In case of defects, they investigate the problem, make necessary changes and submit an MR for integration. In case of new features, additional tasks such as low level design and design review are performed prior to coding. After coding is complete the MR is submitted for integration by the developer. The code inspection is done afterward and any issues are resolved with additional MRs. If an MR is opened by a tester, it may take some time until someone is assigned to work on it and eventually starts working on it. In this case the MR open time may significantly precede the time when the work started. Often developers will find an issue to work on in the regular course of their activities. They may investigate the issue, complete the necessary changes in their private workspaces, and then open and immediately submit an MR for integration. In this case opening an MR does not precede the start of work.


There are three basic questions in software projects:

  • When will it be ready (will have desired functionality and minimal quality)?
  • How much will it cost (in developers time)?
  • What should we do to achieve desired quality/effort/dates? (E.g., drop features, delay release, test more)

More specifically, how to show and summarize the quality, the current state, and the effort that went into a software release, or a build, or a system.

What is the relationship (over time, because there is no direct relationship data) between MRs.