The DaCapo benchmarks: Java benchmarking development and analysis (extended version)

Blackburn, Stephen M. and Garner, Robin and Hoffmann, Chris and Khan, Asjad M. and McKinley, Kathryn S. and Bentzur, R. and Diwan, Amer and Feinberg, Daniel and Frampton, Daniel and Guyer, Samuel Z. and Hirzel, Martin and Hosking, Antony L. and Jump, Maria and Lee, Han and Moss, J. Eliot B. and Phansalkar, Aashish and Stefanović, Darko and VanDrunen, Thomas and von Dincklage, Daniel and Wiedermann, Ben

Abstract

Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex run-time tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.

@techreport{Blackburn+2006TR,
  author = {Blackburn, Stephen M. and Garner, Robin and Hoffmann, Chris and Khan, Asjad M. and McKinley, Kathryn S. and Bentzur, R. and Diwan, Amer and Feinberg, Daniel and Frampton, Daniel and Guyer, Samuel Z. and Hirzel, Martin and Hosking, Antony L. and Jump, Maria and Lee, Han and Moss, J. Eliot B. and Phansalkar, Aashish and Stefanovi{\'c}, Darko and VanDrunen, Thomas and von Dincklage, Daniel and Wiedermann, Ben},
  title = {The DaCapo benchmarks: Java benchmarking development and
                    analysis (extended version)},
  institution = {Australian National University},
  year = {2006},
  type = {Technical Report},
  number = {TR-CS-06-01},
  url = {http://dacapobench.org},
  gscholar = {20}
}