Resource Utilization Optimization in SMT and CMP Architectures

Date

2018

Authors

Wang, Wenjun

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Simultaneous Multi-Threading (SMT) systems improve performance by allowing multiple independent threads to be executed concurrently with shared key resources. Unfair sharing of resources among multiple threads can easily clog the pipeline stages by slower threads and hamper the normal processing of faster threads. Effective distribution of critical shared resources among concurrently executing threads is key to improving overall system performance in SMT processors. Our research targets on efficient resource allocation among threads to boost the system performance. Several techniques are proposed in this dissertation: Thread Suspension, Integrated Autonomous Control, Speculative Trace Control, Dynamic Resource Allocation with Neural Networks.

One of the most critical shared resources is physical register file in the rename stage and a disproportional distribution of these rename registers can easily render it a bottleneck along the pipeline stages. Several techniques have been proposed to improve the utilization of physical register file.

We first propose a thread-suspension algorithm to better utilize the register file. Once the overall physical register file utilization exceeds a certain threshold, the thread with the highest occupancy is temporarily suspended in order to allow other threads more space to proceed for achieving a higher throughput. To further extend the technique, we propose a thread suspension scheme combined with a uniform register file capping technique. When a shared resource congestion occurs on the pipeline stages, the thread with the lowest resource utilization efficiency among all active threads will be suspended so as to provide other threads more space to proceed for a higher throughput. Not only one but potentially more than one threads is selected for temporary suspension.

We also develop a machine learning algorithm to efficiently allocate registers among concurrent executing threads based on current resource utilization circumstances. An off-line training process is first employed to establish a well-trained neural network which is then applied to dynamically adjust the resource distribution in real time.

SMT processors adopt speculation execution to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss- speculation, which could have been used by other valid instructions and such waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a thorough analysis is given to investigate the trade-offs among apply the capping technique to limit the instructions in speculation trace at different pipeline stages so as to maximize its benefits.

We then apply an autonomous integrated control of shared resources among multiple threads based on threads' temporal behaviors in real time. Such process manages the usage of the most critical resources simultaneously for each thread. A very significant system performance improvement is delivered.

A Chip Multi-Processor (CMP) usually employs a shared, last-level cache to use on-chip memory resources effectively. The shared last-level cache is one of the most important shared resources due to its impact on system performance. We propose a dynamic partitioning technique of shared cache to eliminate interference amongst multiple cores.

Description

This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.

Keywords

Chip Multi-Processor, Resource Sharing, Simultaneous Multi-Threading, Superscalar

Citation

Department

Electrical and Computer Engineering