Design and implementation of low power reorder buffer
Superscalar microprocessor designs, in order to meet contemporary demands, are required to have increased performance without the overhead of a large increase in power. Performance increases can be achieved by executing instructions out-of-order. In order to pursue out-of-order execution there has to be a means by which those instructions complete in program order to the architected register file. This functionality can be accomplished with the use of a reorder buffer. The reorder buffer will allow for an increase in performance; however the key is to design a reorder buffer in such a way that its benefit on performance is not out-weighed by a large increase in power consumption.
This thesis focuses on a power consumption comparison of the centralized reorder buffer architecture versus the distributed reorder buffer architecture. The centralized reorder buffer is based on the research and implementation algorithm that was investigated by Wallace et al. [Wallace94]. The implementation strategy for the centralized reorder buffer is functionality focused with little emphasis on power. The distributed reorder buffer theory was proposed by Kucuk et al. [Kucuk03]. In this thesis the implementation strategy for the distributed reorder buffer proposed by Kucuk et al. was investigated. A low power distributed reorder buffer was designed and implemented using the Spartan 3E FPGA and ultimately used to determine a power comparison of the centralized reorder buffer and the distributed reorder buffer.
It is found that the distributed reorder buffer design is a more power efficient design because it handles dynamic power in a more robust and less complex manner than the centralized reorder buffer. It does so through design optimizations which ultimately lower switching activity throughout the internal logic to accomplish the output to the architected register file. The distributed reorder buffer's use of smaller components provides a less intensive power mechanism to accomplish the same functionality. The distributed reorder buffer that was designed and implemented in this thesis consumed 12.84% less power compared to the centralized reorder buffer.