Change Log

When What
March 26th, 2015 Donated by Norbert Siegmund


author={Jianmei Guo and Czarnecki, K. and Apel, S. and Siegmund, N. and Wasowski, A.},
booktitle={Automated Software Engineering (ASE),
2013 IEEE/ACM 28th International Conference on},
title={Variability-aware performance prediction: A statistical learning approach},
keywords={configuration management;learning (artificial intelligence);
software performance evaluation;statistical analysis;configurable software systems;
program variants;statistical learning;variability-aware performance prediction;
Accuracy;Correlation;Feature extraction;Measurement;Predictive models;Silicon;
Software systems},

Data Originally from:

 author = {Siegmund, Norbert and Kolesnikov, Sergiy S. and K\"{a}stner, Christian and Apel, Sven and Batory, Don and Rosenm\"{u}ller, Marko and Saake, Gunter},
 title = {Predicting Performance via Automated Feature-interaction Detection},
 booktitle = {Proceedings of the 34th International Conference on Software Engineering},
 series = {ICSE '12},
 year = {2012},
 isbn = {978-1-4673-1067-3},
 location = {Zurich, Switzerland},
 pages = {167--177},
 numpages = {11},
 url = {http://dl.acm.org/citation.cfm?id=2337223.2337243},
 acmid = {2337243},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},

About the Data

Overview of Data

Here are six existing real-world configurable systems with different characteristics: different sizes (42 thousand to 300 thousand lines of code, 192 to millions of configurations), different implementation languages (C, C++, and JAVA), and different configuration mechanisms (conditional compilation, configuration files, and command-line options). The dataset contains the whole population of each system, i.e., all configurations of each system and their performance measurements (the exception is SQLITE, for which the dataset contains 4,553 configurations for prediction modeling and 100 additional random configurations for prediction evaluation).

For each system, the performance has been measured using a standard benchmark, either delivered by its vendor (e.g., ORACLE’s standard benchmark for BERKELEY DB) or used widely in its application domain (e.g., AUTOBENCH and HTTPERF for the APACHE Web Server).

Paper abstract

Configurable software systems allow stakeholders to derive program variants by selecting features. Understanding the correlation between feature selections and performance is important for stakeholders to be able to derive a program variant that meets their requirements. A major challenge in practice is to accurately predict performance based on a small sample of measured variants, especially when features interact. We propose a variability-aware approach to performance prediction via statistical learning. The approach works progressively with random samples, without additional effort to detect feature interactions. Empirical results on six real-world case studies demonstrate an average of 94% prediction accuracy based on small random samples. Furthermore, we investigate why the approach works by a comparative analysis of performance distributions. Finally, we compare our approach to an existing technique and guide users to choose one or the other in practice.