Utilizing Non-Volatile Memory Technologies in Emerging Applications




Yan, Hao

Journal Title

Journal ISSN

Volume Title



Conventional DRAM-based main memory is facing critical challenges in both latency and scalability. Non-volatile memories, such as spin transfer torque magnetoresistive random-access memory (STT-MRAM), are emerging as a replacement for the existing DRAM-based main memory, achieving a wide variety of advantages. These advantages include both in data storage such as including almost zero idle power, no data refreshes, non-destructive memory reads, nearly infinite data retention time and in computing such as high performance and low power. However, replacing DRAM with STT-MRAM also results in new design challenges.

When it is used for data storage, it will have a reliability challenge named read disturbance. A simple read-and-restore scheme preserves data integrity under read disturbance, but incurs significant performance and energy overheads. Consequently, by utilizing unique characteristics of mobile applications, we propose FlowPaP, a flow pattern prediction scheme to dynamically predict the write-to-last-read distances for data frames running on a handheld device. FlowPaP identifies and removes unnecessary memory restores originally required for preventing read disturbance, significantly improving energy efficiency and performance for STT-MRAM-based handheld devices. In addition, we propose a flow-based data retention time reduction scheme named FlowReR to further lower energy consumption of STT-MRAM at the expense of reducing its data retention time. FlowReR imposes a second step that marginally trades off the already improved energy efficiency for performance improvements. Experimental results show that, compared to the original read-and-restore scheme, the application of FlowPaP and FlowReR together can simultaneously improve energy efficiency by 34% and performance by 17% for a set of commonly used Android applications.

When it is used for computing, the state-of-the-art research will meet its device- and architecture-level challenges. Device-level challenges that prevent it from generating sufficient resistance levels in a cell. The limited number of intermediate resistance states in a STT-MRAM cell largely reduces the precision of the synaptic weight stored in it, and also lowers the model inference accuracy. Consequently, this dissertation enables STT-MRAM, for the firs time, as an effective and practical deep learning accelerator. In particular, it proposes a full-stack framework spanning multiple design levels, including device-level fabrication, circuit-level enhancements, architecture-level synaptic weight quantization, and system-level accelerator design. The proposed framework significantly mitigates the model accuracy loss due to reduced data precision in a cohesive manner, constructing a comprehensive STT-MRAM accelerator system for fast NN computation with high energy efficiency and low cost. We observe that the weight matrices in convolutional layers, although being small, are used many times by the input data. As a result, they have better chances to be replicated and co-located in the same crossbar array to improve data processing parallelism. The first scheme proposed in this dissertation therefore utilizes the shared input in replicating weight matrices and overlapping them to improve parallelism. Furthermore, this dissertation proposes a heterogeneous accelerator consisting of both large and small crossbar arrays, by mapping fully-connected layers to large crossbar arrays to obtain the area/power reductions and keeping convolutional layers in conventional (small) crossbar arrays to retain performance. Experimental results show significant benefits of the proposed schemes in performance, energy efficiency, and area.


This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.




Electrical and Computer Engineering