Why embedding in-chip monitoring IP is an essential step to maximize performance and reliability and minimize power.
The latest SoCs on advanced semiconductor nodes typically include a fabric of sensors spread across the die, and for good reason. But why and what are the benefits? This first blog of a three-part series explores some of the key applications for in-chip thermal sensing and why embedding in-chip monitoring IP is an essential step to maximize performance and reliability and minimize power, or a combination of these objectives.
As SoC developers migrate to new smaller geometry nodes they enjoy considerable benefits of higher logic density, faster performance and lower power. However, the challenges also increase and need addressing in the light of the objectives to maximize performance, minimize power or optimize reliability or some combination of these, depending on the end application. One of the key challenges is the ‘end of Dennard Scaling’ as highlighted by John Hennessy at the AI Hardware Summit 2019.
This is referring to the fact that since the mid-2000s, as one migrates from one node to the next, the power per unit silicon area is no longer remaining roughly constant but in fact has been steadily increasing. When combined with a trend to very large chips, even approaching reticle size, and the introduction of FinFETs, which have more difficult thermal properties through their 3D structure, it doesn’t require much imagination to predict that chips can potentially develop hotspots/thermal problems.
As touched on above, the power per unit silicon area has been steadily increasing since the mid-2000s, and this further leads to the potential for localized hotspots and temperature gradients across the die. Temperature monitoring of critical circuits is now standard and it is quite common on FinFET SoCs to see multiple tens of temperature sensors spread across a large die.
Thermal shut down should the die temperature cross an elevated threshold is a basic function, but dynamic thermal management offers a finer grain approach. This is where in-chip temperature sensors provide input to DFS or DVFS schemes and the power and processing performance can be managed so as to control the temperature rather than brutally switch off circuits or the entire chip altogether. Accurate temperature sensing brings the benefit that the temperature threshold can be set closer to the limit enables maximum or higher performance to be maintained for longer, before the device is shut down or performance throttle back.
CPU load balancing for multicore processors is another key application. SoCs targeted at AI applications now include hundreds even thousands of cores, typically organized in clusters. Depending on load allocation, hotspots and temperature gradients can develop, even when the loads are balanced through software. A more optimum load balancing takes into account the temperature profile across the chip and leads to lower maximum temperature and reduced temperature gradients and cycling. The lower maximum temperature will enable higher data throughput through less CPU throttling and will also improve reliability.
To help address these challenges, a common strategy is to embed a fabric of in-chip monitors across the die. These give visibility into on chip conditions which is especially valuable for critical circuit blocks. The latest FinFET SoCs often embed tens of temperature sensors to monitor hotspots and enable thermal management to reduce maximum temperature and temperature gradients across the die.
In-chip monitoring is now fundamental for all developers who wish to obtain the maximum from their SoC, whether it is performance, power consumption, reliability or a combination of these.
Leave a Reply