Experts at the Table, Part 2: Heat is becoming a serious issue as thermal densities rise and this creates problems for industries such as automotive that require robust, long-lived components.
Optimization used to be a simple timing against area tradeoff but not anymore. As we go to each new node, the tradeoffs become more complicated involving additional aspects of the design that used to be dealt with in isolation.
Semiconductor Engineering sat down to discuss these issues with Krishna Balachandran, director of product management for low power products at Cadence; Tobias Bjerregaard, CEO for Teklatech; Aveek Sarkar, vice president for sales and support at Ansys; and Sarvesh Bhardwaj, group architect for ICD Optimization at Mentor Graphics. Part one discussed new challenges and the problems associated with the traditional methods, specifically talking about problems created by siloing. What follows are excerpts from that conversation.
SE: Another change involves the integral of power – heat. Thermal effects are not instantaneous and mean that we have to consider the design over time.
Balachandran: People are already dealing with thermal challenges. This started before 10nm, so it is more a function of how many cores they have on the chip. It is a function of how fast they want to run them. Everyone has heard of dark silicon because you don’t want to use up all the space and integrate everything on the chip and then have to turn them off because you are going to burn the chip. So the dark silicon problem will become worse in 10nm and 7nm because you have more space. You have a doubling of the area, so you can put twice the amount of logic on the die. That part of Moore’s Law is still working well, but because of the thermal profile do you turn off some of the cores and then ask what was the point of integrating them on the same chip? This is a problem that customers in the mobile space have already faced. They were working on it even before they got to 10nm, and it has to be considered not just in the context of the SoC, but at the chip/board/package level. It is a total thermal profile that has to be created, and the problem could be on the board or in the package, but the fix might be on the SoC. So you don’t know where the problems are or the fixes will be until you have analyzed all of it. It is not a smooth, automated flow today. It is coming in bits and pieces, and the EDA industry recognized the problem and has been working on it. The problem will get worse and become more mainstream and the solutions will become more mature by the time 10nm and 7nm become more mainstream.
Bjerregaard: It is because the systems have become so large that it has become a problem. It is not a specific technology issue. It is more of a system issue. This is the same problem that you have always had—one team responsible for one core and a different team responsible for another core, but who is responsible for the complete picture? There is someone responsible but they don’t have the authority to make the changes in the cores that may be necessary. It is also an analog/digital dilemma. One team doing each and a lot of stuff falls between them. That is where semiconductor companies need to change the way they design chips and scope their systems. We need ways to address these kinds of problems.
SE: Power reductions for 10nm and 7nm are not keeping up with Moore’s Law, so the Power Trends issues will get worse. That will create new problems that did not exist in the past.
Bjerregaard: It is a combination of things that is getting worse. In one dimension you have the systems getting bigger, and the other dimension is that the power density is increasing. 10nm finFET is accelerating the benefits of scaling, but they are accelerating the challenge so it starts to become an exponential problem.
SE: One method for containing power density that is being used by Intel is reducing the number of fins. That does not scale.
Bjerregaard: We are seeing people work more creatively to utilize what was left on the table of the older technologies. One interesting piece of work at ST is looking at how to utilize silicon on insulator technology on a mature node — 28nm. By using silicon on insulator they can control the back biasing and get a completely different performance power tradeoff. And that is what it is all really about — how much performance can you get out of a specific power requirement. So the problem caused by finFET is causing some people, for some designs, to back to other technologies where they can utilize existing approaches in new ways.
Balachandran: But there will continue to be customers moving to 10nm and 7nm, and those are the ones that are high volume, at the cutting edge of design, and needing more on the chip, for whom the power density problems will get worse. The industry has to respond. We cannot shy away from it and people will not shy away from the solution — they will be forced to pay for a solution.
Sarkar: It is a divide-and-conquer problem. It is three-tiered. One is at the device level itself. There is self-heat that is happening, and that has to be solved at the IP level. You have to create some form of modeling for that level. Then the chip and interconnects – how hot will they get, what is the impact of IR and electromigration (EM). There are things that you may consider trivial, but we have seen data from customers related to where they are placing thermal sensors and ask why they were placed there when the thermal hotspot is here? It was because they expected this core to kick in most of the time. In reality that is not where the heat happened because they actually had an MCM with two chips together, and both of the chips were firing at the same time. And because of the thermal coupling, the heat was impacting the other and not where the heat sensor had been placed. This is a multi-chip problem and you have to look at the chip and the package together.
Balachandran: This is particularly a problem for mobile because you have to keep them in a very small form factor and there is no place to put heat dissipation.
Sarkar: And there is no fan. The third aspect is the system level, which is when you hit limits in the power supply. We hit those problems in the past with rack-mounted systems, but today it is at the device level. With mobile you can just throw the thing away, but we are looking at the same nodes going into automotive infotainment systems and ADAS, where the criticality is much more significant. Consider that you are in Las Vegas – you put your car in reverse and you no longer bother looking over your shoulder because you expect the radar to warn you if anything is there. But because it is so hot, the thermal sensor decided to throttle down, so the software slowed down to the point where the radar did not kick in. You think there is nothing there and so you back out. These are the kind of scenarios the industry is starting to worry about. When we look at it from the system point of view there are so many layers. You may ask if the EDA industry is getting ready at each of these layers and are we engaging in the right way and solving each of them. There are lots of new opportunities.
Bhardwaj: With the complexity of the system, there is a lot of opportunity to solve problems related to thermal and system-level power management and integrating the solutions. Depending on the power profile of the different blocks on the chips, it may not be physically possible to operate all of them at the same time. So you have to manage which ones go into low-power mode and which ones are operating at maximum performance. Those kinds of decisions are made at the system level based on the application profile. At the system level there will be more innovation in terms of solving this kind of problem.
SE: Automotive may have issues with 10nm and 7nm because of reliability. This is in part due to EM (and EMI). Do we understand aging in these technologies?
Sarkar: Throw-away is the least of the problem. The worst is that it stops working while traveling at 65mph.
Bjerregaard: An interesting aspect of automotive is that it used to be all about robustness. That was the number one priority. With self-driving cars we need a lot of processing power to make that possible. Artificial intelligence, image processing – it becomes more about creating the performance necessary to make this possible. We will see that they need to move to these technologies to make that kind of processing power possible. The good part is that you can get this level of robustness at the physical level with old technologies, but you can also get it at the system level if you have distributed parallel processing. So there are many ways to achieve robustness. The real problem is this costs power. If you have two or three parallel processors and you do voting, you have to run three times as much, and total power is an issue in cars. They are not power plants – they are cars. So they will move to 10nm and 7nm. They have to.
Balachandran: We are talking about cars having 100+ ECUs. They went from a couple of ECUs 10 years ago to typically 50 or 60 ECUs today, and they are talking about hundreds in the future. With that many, the problem you are talking about when you need that much processing power is that you need different chips, and that creates inefficiencies in power. Also, the speed at which they are processing things is a factor. Integration will have to happen, and if that is true and you have to reduce it to a fewer number of chips, then there will be a push toward finer geometries. That is the only way to reduce chip count. That will motivate the move in automotive toward smaller geometries. As the self-driving car vision becomes a reality, they will be very willing to move to 10nm, 7nm and beyond.
Bjerregaard: We do see automotive companies moving to the finer geometries, and today there are designs at 16nm. We also see them choosing different nodes for different products. They are staying at 40nm and 28nm for certain products and moving to 16nm and smaller nodes in the future for others. The products will become more specialized, and they will pick for particular applications.
Sarkar: Some of the self-driving techniques, and especially those associated with machine learning and cognitive neural frameworks, have a need for speed. But there is a limit to how fast a technology can go, and so the new geometries become necessary. By the time these cars become more of a reality, that technology node will have matured, so there is an intercept point that is set.
Bhardwaj: Reliability is an issue, but not at the same level as we have for satellites and other such equipment. If designed properly, we can have modular design so that if a components breaks it can be replaced and we don’t have to throw the entire car away. With all of these self-driven cars and the technology needed, the compute power will require us to move to the newer nodes. The reliability problem will have to be solved.
10nm Versus 7nm
The economics and benefits of moving to the next process node are not so obvious anymore.
Optimization Challenges For 10nm and 7nm Part 1
Part 1: What will it take to optimize a design at 10nm and 7nm? The problem gets harder with each new node.
Pain Points At 7nm
More rules, variation and data expected as IP and tools begin early qualification process.