Automated Transformation for Performance-Critical Kernels
The performance of many scientific applications depends on a small number of key computational kernels which require a level of efficiency rarely satisfied by existing native compilers. We present a new approach to high performance kernel optimization, where a general-purpose transformation engine automates the production of highly efficient library routines. Our framework requires only an annotated kernel specification and can automatically produce optimized implementations based on tuning parameters controlled by a search driver. The transformation engine includes an extensive suite of optimizations which can be easily expanded using a custom transformation description language. We have applied our transformation engine to generate highly tuned code for key linear algebra kernels used in the ATLAS tuning framework. The time required to produce specifications for these kernels is orders of magnitude less than that required to hand-craft kernel implementations, and yet our framework has achieved similar performance to ATLAS’s highly tuned kernels.