Reduce infrastructure CAPEX while boosting system performance.
What if your chips lived 20% longer without compromising performance, and even while reducing power consumption? How would it affect your product’s reliability and cost? What would be the effect on your profitability?
With the demand for longer-lasting chips growing across industries, designers and reliability engineers face increasing pressure to ensure their products perform correctly for the expected lifetime. Application stress is one of the key contributors to chip degradation over the years, as performance under demanding workloads leads to increased power consumption, higher temperatures, greater reliability risks, and eventually, reduced product lifespan.
This article explores an innovative approach to workload-aware and reliability-aware adaptive voltage scaling that is much safer than traditional methods. Therefore, it enables the use of fewer guard bands (i.e., reclaiming unutilized guard bands), significantly extending SoC lifetime.
Data centers, automotive systems, and consumer devices require installed chips to function reliably over extended periods. Hyperscale data centers, for example, have publicly announced strategic business objectives that aim to reduce capital expenditure (CAPEX) and promote sustainability by stretching the useful life of servers. According to the table below1, this lifetime extension has increased Amazon’s quarterly net income by $0.9B. It has also increased the annual net income of Alphabet and Microsoft by $3B.
The table further shows that those giants performed a similar move to extend server lifetime a few years earlier, indicating a clear trend. Moreover, the impact on net income has increased from $2B-$2.7B in 2021 to $3B-$3.7B in 2023. This surge can be explained by the recent introduction of high-stress workloads, such as generative AI, which require much more computing resources. With the rising hardware expenses, it’s clear why hyperscalers strive to maximize chip lifetime more than ever.
Depreciation schedule changes at Alphabet, Amazon, and Microsoft.
Automotive electronics, meanwhile, must endure up to 15 years in the face of high temperature and mechanical stress while meeting safety requirements. This market has also witnessed an increase in computational requirements with the introduction of centralized ECU architectures and demanding use cases, such as ADAS. As these greater loads increase silicon degradation, providing sufficient lifetime is even more challenging.
Many chipmakers rely on traditional methods to select VDDmin (the minimum operating voltage for reliable performance), resorting to predetermined guard bands that need to last throughout the committed operational lifespan. The guard bands need to account for expected performance degradation due to aging, as well as inaccuracies during test when VDDmin is set, among other things.
These methods are often too conservative, potentially requiring more guard bands than needed to compensate for the lack of accurate real-time data from within the SoC. This deficiency means that chips unnecessarily utilize worst-case guard bands, which limits the potential power savings available. Consequentially, devices may wear out earlier after being configured with higher-than-needed voltages.
The equation is simple. If you had a real-time solution that is both workload-aware and reliability-aware, you wouldn’t have to use conservative guard bands constantly. In other words, you could safely use lower voltages whenever applicable, and that would eventually delay the wear-out phase, besides reducing power consumption considerably.
The traditional semiconductor reliability curve (orange) is augmented by safer voltage scaling to delay the wear-out phase (blue).
The common degradation model (e.g. Negative Bias Temperature Instability, NBTI) is based on a physical phenomenon known as power law:
To estimate the life extension at lower VDD, f(t) and g(t) should be compared at a certain threshold. As depicted below, g(t) that has safer voltage and temperature down-scaling facilitates an acceleration factor that makes it degrade slower, i.e. f(t1) = g(t2).
Further analysis produces , which leads to , revealing the acceleration factor .
g(t) degrades slower than f(t) thanks to a fractional acceleration factor facilitated by safer voltage and temperature scaling.
As seen here, the degradation model is influenced by voltage and temperature, with an acceleration factor (AF) modifying the degradation rate. Lower voltage leads to a lower temperature and to a slower degradation process, resulting in an extended component lifetime due to the fractional effect of the acceleration factor.
Unlike common canary circuits and fixed voltage guard bands, proteanTecs AVS Pro leverages in-chip timing margin monitoring. This solution combines Margin Agents that monitor millions of true paths in real time with dedicated algorithms for better-informed decisions. AVS Pro allows precise guard-band reclamation based on real workloads, real aging, and actual IR drops to reduce more power in real-life scenarios while ensuring reliability.
In-situ monitoring of the true paths per chip is paramount, as the critical paths can change over time according to different aging patterns of individual devices.
Typical lifetime extension according to the percentage of power reduction enabled by AVS Pro. The lifetime extension is VDD and temperature dependent. In this case, it was calculated vs. a baseline VDD level of 750mV and a baseline temperature of 85℃.
By reducing the nominal voltage throughout a chip’s life, AVS Pro lowers power consumption and temperature, reducing the stress on the SoC and prolonging its lifespan. As the chip ages, the guard band reclamation is optimized accordingly, to ensure reliability while maximizing efficiency.
AVS Pro enables engineers to reclaim excess guard bands safely. The resulting decrease in voltage reduces power consumption, temperature, and stress, which delays wear out.
5nm delay degradation simulations [%] at nominal conditions: T junction 85C, V=0.75V
In the simulations above, the degradation that occurs within the chip’s first year without AVS Pro is delayed by more than two years with AVS Pro (blue dotted line). Consequently, the total chip lifetime is extended by a similar factor (blue dotted line).
proteanTecs AVS Pro has demonstrated the potential to extend chip lifespans by up to 18%. In data centers, this translates to hundreds of millions of dollars saved annually by reducing the need for premature replacements. In automotive, it helps meet the 15-year lifespan goal for electronics, ensuring that critical safety features remain intact.
AVS Pro, visualized here, enables significant power saving through safer voltage scaling, leading to 18% projected lifetime extension.
The example above was taken from a mass-produced 5nm communications chip. AVS Pro enabled a 12.5% power reduction, resulting in an 18% increase in predicted lifespan. AVS Pro prevents unnecessary degradation, delaying the onset of wear-out and critical failures by fine-tuning the voltage throughout the chip’s life, and at the same time reduces the power consumption without performance impact.
NOTE: The nominal voltage in this case is 0.65v. The predicted lifespan increase is smaller than for chips that work at 0.75v nominal voltage.
As the pressure to extend chip lifetimes continues to mount, proteanTecs AVS Pro is emerging as a solution for industries that demand long-lasting, reliable electronics. With the ability to balance performance, power reduction and longevity, chipmakers can now offer products that not only perform reliably but also help reduce CAPEX and promote sustainability.
Want to learn more about how AVS Pro can extend the life of your chips? Download our white paper or contact us here.
Leave a Reply