Effective dispatching in simultaneous multithreading (SMT) processors by capping per-thread resource utilization

Date

2011

Authors

Nagaraju, Tilak

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Simultaneous multithreading (SMT) offers an improved mechanism based on the traditional superscalar processors. The most common characteristic of SMT processors is the sharing of key datapath components among multiple independent threads, which ensures improved resource utilization. When critical resources are shared by multiple threads, effective use of these resources proves to be the most important factor in fully exploiting the system potential. Allowing any of the threads to overwhelm these share resources may be hazardous in leading to severely degraded performance.

SMT exploits thread-level parallelism intelligently to compensate the limited instruction level parallelism (ILP) present in each thread and also equally concentrates on the advantages available at the ILP. Subsequently, due to the sharing of resources, the amount of hardware required in an SMT system is significantly less than a multi-core superscalar machine without sacrificing much performance. Most common resources shared in SMT technique include components that are control-complexity-wise easier to share (such as cache memory and physical register bank), and those that are cost-wise better to share (such as Issue Queue (IQ) and various functional units). On the other hand, other more thread-specific component (such as Re-Order Buffer (ROB)) along the datapath is assumed to remain per-thread ownership. How to prevent idling threads from clogging the critical resources in the pipeline becomes a must in sustaining system performance. In this research work, we clearly demonstrate that utilization of resources shared among the threads in an SMT system could significantly affect the overall performance.

Another way of sharing resources is to limit the maximum resource usage. In this research, we show that, by simply setting a cap (threshold) on the number of the critical Issue Queue (IQ) entries each thread is allowed to occupy, the system performance is easily enhanced by a significant margin. This performance gain is obtained with very minimal additional hardware required. By simply capping the resource usage of any thread, utilization of the critical resources can be improved. Further adjusting such a cap according to real-time thread behaviors again demonstrates additional performance gain. In conjunction with this fixed capping scheme we propose another effective scheme with an autonomous cap-adjustment scheme that shows a significant gain over the fixed one. At the same time, proper intelligence has to be incorporated into the resource sharing mechanism to ensure that threads share these components in the most efficient and fair manner. Dispatching schemes are proposed and experimented based on the case study and an extensive simulation shows a significant gain in system throughput by these techniques.

Description

This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.

Keywords

Simultaneous Multithreading Processors

Citation

Department

Electrical and Computer Engineering