Resource Utilization Optimization in SMT and CMP Architectures

Wang, Wenjun

Resource Utilization Optimization in SMT and CMP Architectures

dc.contributor.advisor	Lin, Wei-Ming
dc.contributor.author	Wang, Wenjun
dc.contributor.committeeMember	Duan, Lide
dc.contributor.committeeMember	Liu, Tongping
dc.contributor.committeeMember	Lee, Wonjun
dc.creator.orcid	https://orcid.org/0000-0003-2330-0009
dc.date.accessioned	2024-03-08T17:35:05Z
dc.date.available	2019-05-15
dc.date.available	2024-03-08T17:35:05Z
dc.date.issued	2018
dc.description	This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.
dc.description.abstract	Simultaneous Multi-Threading (SMT) systems improve performance by allowing multiple independent threads to be executed concurrently with shared key resources. Unfair sharing of resources among multiple threads can easily clog the pipeline stages by slower threads and hamper the normal processing of faster threads. Effective distribution of critical shared resources among concurrently executing threads is key to improving overall system performance in SMT processors. Our research targets on efficient resource allocation among threads to boost the system performance. Several techniques are proposed in this dissertation: Thread Suspension, Integrated Autonomous Control, Speculative Trace Control, Dynamic Resource Allocation with Neural Networks. One of the most critical shared resources is physical register file in the rename stage and a disproportional distribution of these rename registers can easily render it a bottleneck along the pipeline stages. Several techniques have been proposed to improve the utilization of physical register file. We first propose a thread-suspension algorithm to better utilize the register file. Once the overall physical register file utilization exceeds a certain threshold, the thread with the highest occupancy is temporarily suspended in order to allow other threads more space to proceed for achieving a higher throughput. To further extend the technique, we propose a thread suspension scheme combined with a uniform register file capping technique. When a shared resource congestion occurs on the pipeline stages, the thread with the lowest resource utilization efficiency among all active threads will be suspended so as to provide other threads more space to proceed for a higher throughput. Not only one but potentially more than one threads is selected for temporary suspension. We also develop a machine learning algorithm to efficiently allocate registers among concurrent executing threads based on current resource utilization circumstances. An off-line training process is first employed to establish a well-trained neural network which is then applied to dynamically adjust the resource distribution in real time. SMT processors adopt speculation execution to fetch continuously and reduce the delays of control instructions. However, a significant amount of resources is usually wasted due to miss- speculation, which could have been used by other valid instructions and such waste is even more pronounced in an SMT system. In order to minimize the waste of resources, a thorough analysis is given to investigate the trade-offs among apply the capping technique to limit the instructions in speculation trace at different pipeline stages so as to maximize its benefits. We then apply an autonomous integrated control of shared resources among multiple threads based on threads' temporal behaviors in real time. Such process manages the usage of the most critical resources simultaneously for each thread. A very significant system performance improvement is delivered. A Chip Multi-Processor (CMP) usually employs a shared, last-level cache to use on-chip memory resources effectively. The shared last-level cache is one of the most important shared resources due to its impact on system performance. We propose a dynamic partitioning technique of shared cache to eliminate interference amongst multiple cores.
dc.description.department	Electrical and Computer Engineering
dc.format.extent	139 pages
dc.format.mimetype	application/pdf
dc.identifier.isbn	9780355957082
dc.identifier.uri	https://hdl.handle.net/20.500.12588/6100
dc.language	en
dc.subject	Chip Multi-Processor
dc.subject	Resource Sharing
dc.subject	Simultaneous Multi-Threading
dc.subject	Superscalar
dc.subject.classification	Electrical engineering
dc.subject.classification	Computer engineering
dc.title	Resource Utilization Optimization in SMT and CMP Architectures
dc.type	Thesis
dc.type.dcmi	Text
dcterms.accessRights	pq_closed
thesis.degree.department	Electrical and Computer Engineering
thesis.degree.grantor	University of Texas at San Antonio
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang_utsa_1283D_12504.pdf
Size:: 2.74 MB
Format:: Adobe Portable Document Format

Download

Collections

Electronic Theses and Dissertations - UTSA Access Only