Accelerating scientific applications on reconfigurable computing systems
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advances in multi-core, many-core, and heterogeneous computing systems have created numerous possibilities of parallelization and hardware acceleration. With their exibility and abundant logic resources, reconfigurable computing systems, in particular systems based on field-programmable gate arrays (FPGAs), have become an attractive option as hardware accelerators. This dissertation studies acceleration of QR and LU matrix decompositions on FPGA-based reconfigurable computing systems, where there are few solutions for scalable floating-point matrix decompositions. First, exploring experiments are presented to reveal the characteristics regarding different embedded processor cores and system configurations on an FPGA-based system. Next, a vector reduction method termed delayed buffering is proposed. With its low latency and high operator pipeline utilization, the method accelerates matrix decomposition by improving composing vector reduction computation. Finally, with the delayed buffering reduction incorporated, using an enhanced tiled matrix decomposition algorithm to access off-chip memory and parallelizing the main decomposition loop for on-chip computation allow a single FPGA to perform better than two general-purpose processors plus a graphics processing unit (GPU) for matrix decomposition of size limited by the capacity of off-chip memory.