Thermal Issues And Modern SoCs: How Hot Is Hot?

Temperature sensors expand their role to help manage chip aging and reliability.

popularity

A Q&A with Moortec CTO Oliver King.

What are the thermal issues of modern SoCs?

Gate density has been increasing with each node and that pushes up power per unit area, and I think that has become an even bigger issue with FinFET processes where the channels are more thermally isolated than the planar processes before them. In the last few planar nodes, leakage was an issue which led to significant power consumption even in idle states. That’s been pegged back somewhat with the latest FinFET nodes, but it’s going to continue to be an issue going forward. In addition to those, if you are developing for consumer products, smartphones, tablets, that kind of thing, then you are always limited in terms of how much heat you can dissipate because you don’t have things like active cooling systems, fans, etc. and obviously the upper limit of the product is limited as well. You know you can’t go up to such higher temperatures and the hotter things get the bigger the issue of reliability and lifetime of device parts, which is perhaps the biggest thing going forward.

How hot is hot?

That all kind of depends on the application. One thing that is interesting now with the growth in ADAS and infotainment sectors within automotive, is we are starting to see that even 125°C is not high enough. Those markets demand higher temperature and operation, so for those guys hot is obviously hotter than it may be for a consumer device where 40°C for the product might be your limit. There will be a thermal mass in there so you can have devices within that are much hotter, chips within that are much hotter, dies that are much hotter, but there’ll be a thermal budget for the whole product so it does very much depend on application. The key thing from our point of view is knowing the temperature accurately, which then it allows you to be closer to the limit, essentially. That’s really what it is all about for modern SoCs: being as close to the limit as you can get without stepping over it. Temperature limits will always be somewhat defined by lifetime reliability. Foundries will say if you operate at 80°C you get 10 years’ lifetime or whatever it might be. If you choose to operate at 90°C you get less lifetime, so there is always going to be that kind of trade off and because temperature has a kind of exponential effect in terms of ageing, accuracy of temperature sensors become correspondingly more important.

What are the trends in temperature sensor use?

A number of years ago when we started selling temperature sensors they were being used generally just for device characterisation, HTOL, burn in tests and those kind of things, and then they started to be used for high temperature alarms, to know to switch off the device or potentially turn on a fan. But over the last few years we have seen much more application in the space of things like Dynamic Voltage and Frequency Scaling (DVFS) and lifetime reliability, such as being used in a feedback control loop where you choose to operate the device at a given temperature and you adjust other things to compensate for that. So certainly the use cases now are much more varied.

The last few years have been all about consumer electronics, and in those cases you are really trying to get an awful lot out of a device whilst not making it too hot, because it’s in your pocket, or it’s on your lap, and that’s driven a lot the applications, the use cases for this. I think we’re potentially moving into a space where just the cost of the advanced nodes mean you want to get everything out of it. All of the different levels of over-design that you can add to your process and your design flow take away performance, so having sensors on there, whether they are temperature sensors, or process, or voltage allow you to get that little bit more performance out of your device and/or improve reliability.

What requirements does that place on temperature sensors?

The most important thing from where we sit is accuracy. The greater the uncertainty in the measured result, the less you can do with it. So, for us, the key thing is accuracy, but then beyond that the next thing really is robustness and testability because you are now using these sensors in application areas where their failure can cause system failure. You need to be able to test them, you need to be able to rely on them, so we are doing a lot in that sense to ensure that there is testability and there is robustness in the operation.

How does Moortec address those requirements?

The first thing is that we meet the accuracy requirements and exceed them, we believe that we are the most accurate out there. In terms of things like testability and robustness, we have done a lot of work to be able to provide online fault detection and diagnosis of our sensors. So, you can interrogate them and you can understand if there is a fault. Firstly, it will tell you if there is a fault and, secondly, you can them ask it what is wrong and it can give you certain amount of health diagnosis and you know all of the other basic things like scan chains and things that are built in, as well. On top of that, we believe ease of integration is an important factor, not necessarily because it gives you a more accurate temperature sensor, but because you know you are making it easier for the customer to use those features, which helps them to provide accurate monitoring.

On the robustness side of things, we have done a lot to guard against process uncertainty and process degradation. If the process isn’t quite where we think it is, or the mismatch isn’t as good as we think, we have a lot of architectures in there that can cope with that. With adding all of the fault detection circuitry in, we have added in quite a lot of ability to debug when the circuit doesn’t really quite work right.

On the ease of integration side of things, we are pretty pleased with our calibration routine because you don’t need to know the temperature when you calibrate and we’re robust to things like supply noise.

The main point is that higher accuracy results in increased performance and improved reliability.



Leave a Reply


(Note: This name will be displayed publicly)