Dispatch algorithms to seek balance between urgency and readiness of instructions and fair scheduling of instructions in SMT processor
A simultaneous-multithreaded (SMT) processor executes multiple instructions from multiple threads every cycle. As a result, threads on SMT processors -- unlike those on traditional shared-memory machines -- simultaneously share all low-level hardware resources in a single CPU. Because of this fine-grained resource sharing, SMT threads have the ability to interfere or conflict with each other, as well as to share these resources to mutual benefit. This research examines the schemes for the fair and more efficient scheduling of instructions from multiple threads and how it impacts on the performance with the smaller and bigger issue size.
The goal of this thesis is to investigate the impact of scheduling the instructions, balancing the urgency and readiness of instructions, on utilizing the issue queue. We begin by examining various schemes for fair scheduling of instructions and comparing them to traditional round-robin with different issue queue sizes. We examine schemes for fair scheduling of instruction by switching threads at every dispatch of instruction thus allowing discrete round-robin scheduling of threads within the clock cycle. For the scheduler with the issue queue size of 16, our scheme outperforms the original round-robin scheme for most of the simulated workloads.
Finally, we also introduce the dispatch algorithm that dynamically seeks balance between emphasizing on the urgency of instructions and readiness of instructions. With this scheme, both instruction scheduling according to their urgency based on operand dependency and instruction scheduling according to the readiness based on operand availability are carefully studied. We first implement the 2-OpBlock scheduler mentioned in  and then extend it to schedule instructions according to number of ready operands. Finally we introduce the scheme that adaptively switches between the above two schemes for better utilization of the issue queue. Our measurements show how these scheduling algorithms impact performance and the utilization of low-level hardware resources.