Home
TECHNICAL PAPERS

Revealing DRAM Operating GuardBands through Workload-Aware Error Predictive Modeling

New technique for automatic scaling the DRAM refresh period under reduced supply voltage that minimizes the probability of failures.

popularity

Abstract
Abstract—Improving the energy efficiency of DRAMs becomes very challenging due to the growing demand for storage capacity and
failures induced by the manufacturing process. To protect against failures, vendors adopt conservative margins in the refresh period
and supply voltage. Previously, it was shown that these margins are too pessimistic and will become impractical due to high power
costs, especially in future DRAM technologies.

In this paper, we present a new technique for automatic scaling the DRAM refresh period under reduced supply voltage that minimizes
the probability of failures. The main idea behind the proposed approach is that DRAM error behavior is workload-dependent and can
be predicted based on particular program inherent features. We use a Machine Learning (ML) method to build a workload-aware
DRAM error behavior model based on the program features which we extract from real workloads during our DRAM error
characterization campaign. With such a model, we identify the marginal value of the DRAM refresh period under relaxed voltage for
each DRAM module of a server that enable us to reduce the DRAM power.

We implement a temperature-driven OS governor which automatically sets the module-specific marginal DRAM parameters discovered
by the ML model. Our governor reduces the DRAM power by 24% on average while minimizing the probability of failures. Unlike
previous studies, our technique: i) does not require intrusive changes to hardware; ii) is implemented on a real server; iii) uses a
mechanism that prevents any abnormal DRAM error behavior; iv) can be easily deployed in data centers.

Find the technical paper here.

Mukhanov, L., Tovletoglou, K., Vandierendonck, H., Nikolopoulos, D., & Karakonstantis, G. (2020). Revealing DRAM Operating GuardBands through Workload-Aware Error Predictive Modeling. IEEE Transactions on Computers. https://doi.org/10.1109/TC.2020.3033627



Leave a Reply


(Note: This name will be displayed publicly)