Project Planning


Change Log

When What
March 12, 2015 Donated by Paul Klint, Davy Landman, and Jurgen Vinju


author={Klint, P. and Landman, D. and Vinju, J.},
booktitle={Software Maintenance (ICSM), 2013 29th IEEE International Conference on},
title={Exploring the Limits of Domain Model Recovery},
keywords={public domain software;software maintenance;specification languages;systems re-engineering;domain model recovery limits;domain-specific languages;legacy applications;modern legacy systems;reengineering families;reference model extraction;reverse engineering domain knowledge;source code;Analytical models;DSL;Documentation;Object oriented modeling;Planning;Project management;Reverse engineering;domain models;reverse engineering},

Paper Abstract

We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications?

Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for “modern legacy systems”? In this paper we select two open-source applications and answer the following research questions: which parts of the domain are implemented by the application, and how much can we manually recover from the source code? To explore these questions, we compare manually recovered domain models to a reference model extracted from domain literature, and measured precision and recall.

The recovered models are accurate: they cover a significant part of the reference model and they do not contain much junk. We conclude that domain knowledge is recoverable from “modern legacy” code and therefore domain model recovery can be a valuable component of a domain re-engineering process.

About the Data

This dataset contains multiple models:

  • A manually constructed domain model of the project planning domain based on the PMBOK guide
  • 4 manually constructed domain models of 2 project planning applications:
    • One is based on a manual read of all the source code
    • The other is based on manual full traversal of the UI


The model is written as a Rascal-MPL ADT.