Improving Chip Multiprocessor Performance By Exploiting Dynamic and Speculative Updates
Multicore environment has become the reference architecture for improving performance of modern and future microprocessor systems. The initial implementations of multicore technology have had a modest number of cores (two to four). However, the latest trend in chip multiprocessor (CMP) is to incorporate a larger number of cores on each processor to enhance the throughput. However, increasing the number of cores on a single chip leads to higher demand on the on-chip cache capacity as well as on both on-chip and off-chip bandwidth
due to coherence and capacity-related misses, respectively. Previous researches have proposed various protocols, such as write-update policy, write invalidate policy, etc., to enhance the scalability and the performance of the distributed shared memory multiprocessors. However, pure write-update protocols are not appropriate since they incur severe performance degradation as compared to write-invalidate protocols because of the heavy traffic caused by the exhaustive updates of the sharers. The conventional update-based protocols update the sharers of each and every cache block. Updating the sharers of each individual cache blocks demands excessive bandwidth. On the other hand, write-invalidation based protocols maintain cache coherence by invalidating copies of a memory block and do not update the contents of the sharers according to the new modified value. The invalidation strategy of the sharers in the write-invalidation based protocols reduces the expensive bandwidth vi contention. However, many of the updated sharers of the cache blocks, as in the write-update protocol, may be reused in the near future. Invalidations of all those sharers cause many coherence misses and invoke many unnecessary message passing transactions which reduce the
performance of the CMP. On the other hand, it should be considered that previous researches have already shown that updating the sharers of all the cache blocks is the most optimal solution because many of the updated data blocks, which are cached in the cores of the sharers, may not be reused in the near future. As a result, these unnecessary updates will eventually increase the execution time as well as the network latency without yielding any benefit to the performance. In this thesis, novel policies are proposed and evaluated to enhance the performance of cache coherence protocols that (1) exploit different mechanisms of dynamic and speculative updates of the sharers,(2) increase the L1 cache hit rate by the speculative updates, (3) reduce network latency by minimizing many message passing transactions (4) enhance the speedup of the chip multiprocessor (CMP) and (5) reduce the network traffic caused by the unnecessary transactions caused due to the coherence misses in the write-update protocols. The proposed protocols introduce some dynamic and speculative techniques for seamless dynamic adaptation between write-invalidate and write-update strategy on a per block basis. The protocols are evaluated in a directory based 64-core CMP and they are effective at improving performance by reducing coherence misses and on-chip traffic in large scale chip multiprocessor. These properties are essential to ensure the continued scaling of future multi-core platforms.