Achieving accurate and context-sensitive timing for code optimization
Key computational kernels must run near their peak efficiency for most high performance computing (HPC) applications. Getting this level of efficiency has always required extensive tuning of the kernel on a particular platform of interest. The success or failure of an optimization is usually measured by invoking a timer. Understanding how to build reliable and context-sensitive timers is one of the most neglected areas in HPC, and this results in a host of HPC software that looks good when reported in papers, but which delivers only a fraction of the reported performance when used by actual HPC applications. In this paper we motivate the importance of timer design, and then discuss the techniques and methodologies we have developed in order to accurately time HPC kernel routines for our well-known empirical tuning framework, ATLAS.