Exploiting CUDA acceleration and impact of data transfer overhead on heterogeneous system
With the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as bioinformatics, energy, finance and several research areas. Even though the GPUs provide highly parallel processing capability, the interface between CPU and GPU could be a performance bottleneck due to heavy data transfer. To investigate this bottleneck and find out the solutions, hotspot analysis based partial GPU acceleration is studied and performance impact on data size variation is observed as well.
Here, I architecturally characterize some basic kernels and genetic applications, and also investigate performance hotspot functions in HMMER 3.0 and LavaMD. HMMER is an application whose main use is searching sequence databases for homologs of protein sequences. LavaMD is a molecular dynamics application and it can be found in Rodinia benchmark suits. Individual hotspot acceleration and full GPU acceleration are compared in terms of performance (speedup). In some cases, if data transfer time is overwhelming the computation time on GPU, it would be better keep the computation on CPU instead of using GPUs. Thus, I aim to observe the borderline between CPU vs. GPU performance as well as the effects of using different types of memory. For this observation I use a security application name blowfish with different input size. The Security application includes several common algorithms for data encryption, decryption and hashing. Among others blowfish is one of the popular security applications. Based on workload characterization and bottleneck analysis, I provide optimization methodologies to remove the bottleneck.