Accelerating scientific applications on reconfigurable computing systems




Tai, Yi-Gang

Journal Title

Journal ISSN

Volume Title



Advances in multi-core, many-core, and heterogeneous computing systems have created numerous possibilities of parallelization and hardware acceleration. With their exibility and abundant logic resources, reconfigurable computing systems, in particular systems based on field-programmable gate arrays (FPGAs), have become an attractive option as hardware accelerators. This dissertation studies acceleration of QR and LU matrix decompositions on FPGA-based reconfigurable computing systems, where there are few solutions for scalable floating-point matrix decompositions. First, exploring experiments are presented to reveal the characteristics regarding different embedded processor cores and system configurations on an FPGA-based system. Next, a vector reduction method termed delayed buffering is proposed. With its low latency and high operator pipeline utilization, the method accelerates matrix decomposition by improving composing vector reduction computation. Finally, with the delayed buffering reduction incorporated, using an enhanced tiled matrix decomposition algorithm to access off-chip memory and parallelizing the main decomposition loop for on-chip computation allow a single FPGA to perform better than two general-purpose processors plus a graphics processing unit (GPU) for matrix decomposition of size limited by the capacity of off-chip memory.


This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.


FPGA, hardware acceleration, LU decomposition, QR decomposition, reconfigurable computing system, vector reduction



Computer Science