Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century

Blackburn, Stephen M. and McKinley, Kathryn S. and Garner, Robin and Hoffmann, Chris and Khan, Asjad M. and Bentzur, Rotem and Diwan, Amer and Feinberg, Daniel and Frampton, Daniel and Guyer, Samuel Z. and Hirzel, Martin and Hosking, Antony and Jump, Maria and Lee, Han and Moss, J. Eliot B. and Phansalkar, Aashish and Stefanović, Darko and VanDrunen, Thomas and von Dincklage, Daniel and Wiedermann, Ben

Abstract

Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our field. The rise of managed languages such as Java, C#, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.

@article{Blackburn+2008CACM,
  author = {Blackburn, Stephen M. and McKinley, Kathryn S. and Garner, Robin and Hoffmann, Chris and Khan, Asjad M. and Bentzur, Rotem and Diwan, Amer and Feinberg, Daniel and Frampton, Daniel and Guyer, Samuel Z. and Hirzel, Martin and Hosking, Antony and Jump, Maria and Lee, Han and Moss, J. Eliot B. and Phansalkar, Aashish and Stefanovi{\'c}, Darko and VanDrunen, Thomas and von Dincklage, Daniel and Wiedermann, Ben},
  title = {Wake Up and Smell the Coffee: Evaluation Methodology for the
                    21st Century},
  journal = {Communications of the ACM},
  year = {2008},
  volume = {51},
  number = {8},
  pages = {83--89},
  month = {August},
  doi = {10.1145/1378704.1378723},
  acm = {http://dl.acm.org/authorize?N93660},
  gscholar = {86}
}