Optimizing processor design in high-performance computing now requires lots of small changes.
Performance levels in traditional or hyperscale data centers are being limited by power and heat caused by an increasing number of processors, memory, disk and operating systems within servers.
The problem is so complex and intertwined, though, that solving it requires a series of steps that hopefully add up to a significant reduction across a system. But at 7nm and below, predicting exactly how a chip will actually operate is quite challenging. And in the high-performance computing market, where the goal is to design the highest-performing chips, power now dominates as the primary limiting factor.
This has caused a significant shift in how chips are designed for this segment of the market, particularly for cloud-based data centers where always-on or rapid power-up is required. Unlike in the past, where overdesign was a commonly used approach for ensuring uptime in server racks, that is no longer an option because it affects both power and performance.
Energy is expensive, particularly in a large data center. In fact, it’s a line item in most data center budgets, and it can vary greatly depending upon regional availability of energy, the heat generated by server racks, the direction and temperature of air or liquid used to cool them, as well as how many processors are “on” versus “off” or in some state in between in cloud-based operations.
“It was estimated in 2014 that 2% of the U.S. energy consumption went into powering up data centers alone,” said Ankur Gupta, director of field applications at Ansys. “Four years later, that number may be more like 5%. The economics of running these data centers is driving companies to ask why computers are consuming all this energy.”
Data centers are using simulation tools to analyze racks of servers because each device is a heat source that needs to be cooled. That typically involves what type of cooling is used and whether heat dissipation and cooling can be optimized.
“Heat in data center racks poses a reliability problem for chips,” Gupta said. “That puts us squarely in the semiconductor domain where we start to look at process, voltage, and temperature. And because of the heat that’s being generated, temperature needs to be analyzed a lot better than before. If you look at the space overall, the compute space in mobile for instance, people have started worrying about voltage variation and process variation because of ultra-low voltage operating corners, but temperature is still sort of just guard-banded out. You assume a temperature and your phone gets a little hot, but that’s not detrimental to the overall product.”
But the HPC market offers challenges beyond what many designers are used to in the mobile space. The on-chip temperature gradient is no longer as low as it used to be, so a flat assumption no longer works.
“The problem is that much harder in HPC because the power consumption is two orders of magnitude higher,” he said. “You’re talking 3 to 5 watts on a mobile phone versus 300 to 500 watts on the rack. At the device level, there are local thermal effects like self-heating of every finFET device. From there on, you look at the impact of on-chip temperature gradient, which must be analyzed because there could be long timing paths that may be critical and spanning entirely different regions on-chip. Maybe part of the path is lying very close to memories, and under certain workload conditions the temperature is much higher than the nominal temperature assumption.”
Fig. 1: High-performance computing in action. Source: IBM
Overdesign no more
Overdesign has long been the answer to reduce design risk, but this approach no longer works at the leading edge.
“When designers are uncertain about the impact of variation effects on transistor-level components, they add margin—this ensures that their chips will work, but trades off performance, power and die area,” said Jeff Dyck, director of engineering at Mentor, a Siemens Business.
For example, when a chip has to work from -40°C to 125°C at voltages ranging from 0.48v to 1.2v, and across process variation to 4 sigma, designers often will simulate a subset of the PVT corners and maybe a few hundred Monte Carlo samples (~2.5 sigma) at one or two of the worst case PVTs.
“This is used to help guess what the performances are under variation, but since there is uncertainty in the guesses, they may add a bit of die area, increase voltage, and decrease performance to compensate for errors in estimates,” Dyck said. “It is not uncommon to see 5% to 30% margins added to account for unknown variation effects.”
That degrades performance and increases power consumption at advanced nodes.
“Overdesign is not the answer, and the price associated with it is steadily on the rise relative to the benefits you get at a given technology node,” said Oliver King, CTO at Moortec Semiconductor. “The biggest problem with overdesign is knowing how much you’ve overdesigned. Especially on the most advanced nodes, nobody really knows what a finFET aging model does, for example.”
An emerging technique to combat overdesign is real-time monitoring of the chip itself, King said. “If we can say, in mission mode, this is how much something has degraded, then it is giving them the ability to react. It’s not as good as having guaranteed lifetime aging models, but no one has that today so we’re having to work around it.”
In high-performance computing, particularly AI and cryptocurrency mining, the key concern is power management.
“The goal is to get maximum throughput for the minimum amount of energy spent, especially if you are crypto-currency mining because that’s really just electricity dollars spent for bitcoins back out,” said King. “The same is true if you are doing high-performance computing, or if you are providing data centers and number crunching, or if you’re doing AI. If you’re on the end of Alexa, working out what everybody is saying, all of that is costing money. At the moment, all of those chips fall into one category. With those, there is a very strong desire to push power, and by that reduce supply voltages and operate things closer to the edge. They know there is margin there and they want to get to the point where the margin is almost gone, or maybe even in some cases, it might have gone and they have to back up a bit.”
All of this requires power management, both in the sense of managing the thermal effects inside the chip, package and board, as well as within the server, the data center and even in the commercial power grid.
Traditionally, the high-performance compute environment was not a low power environment.
“These servers are not battery-operated,” said Marc Swinnen, product management director in the Digital & Signoff Group at Cadence. “They run off the wall socket. It used to be all about speed, so they really didn’t care much about power. But at advanced nodes, server processors are hitting the power envelope now. You can only put so many 100- or 200-watt chips on a board before it starts melting, so you have to care about it. Now everybody has become a low-power designer, whether they like it or not.”
With the system and the chip, the power grid must be created to supply the power demands of these processes and their very extensive grids.
“There’s a traditional sign off methodology that checks if the grid will actually support the power distribution, but that’s often done purely in terms of voltage drop,” Swinnen said. “A voltage drop limit is set, and then you want to see that the chip doesn’t exceed that anywhere, in any activity. The problem is that these limits are becoming tighter. The higher speeds, and the lower voltages means much more care must be taken in the design of the power grid.”
With the 7nm node having much higher resistance, it becomes even harder to design these power grids adequately, he noted. “Rather than looking at this purely as a voltage drop margin, we’ve been asking what engineering teams are really concerned about, particularly in regard to the timing impact of the voltage drop.”
Advanced nodes and static leakage
While there are still many challenges from advanced nodes, one area where design has benefited from the latest technology is the gate structure in finFETs, which substantially reduces leakage current over previous 2D transistor designs.
“When you say ‘high performance’ people usually think about a lot of power,” said Jerry Zhao, product management director in the Digital & Signoff Group at Cadence. “You basically need to have strong power to achieve that performance, just like in little sports cars. They burn a lot of a gas or battery so they can run faster. How to overcome all the challenges is the thing that we as engineers need to bring together to find the solutions. Thanks to the manufacturing technology at the foundries, and also leading companies at 7nm and 5nm, finFET technologies cut a very important power component, which is the leakage, by a large amount. That’s where the advanced technologies are helping us, which gives us also some wiggling room to waste a certain power on the dynamic side so that you can run much, much faster and push the electronics faster. In terms of the chip designs, the power grid delivery network, that goes from your battery to board to package and eventually goes to your processor. And that is very complex, and each of those powers may have a unique power domain for certain cores or certain functionality in such a way that you can turn them on and turn them off.”
But no technique is effective forever when it comes to device scaling. Leakage was under control at 16/14nm, but it is beginning to creep up again at 10/7nm. The next-generation technology to reduce leakage will be the gate-all-around FET, which may include a horizontal nanowire or a nanosheet. It’s not clear when that will be introduced, but the current thinking is that technology will begin to appear at 5nm or 3nm, depending upon whose definition of a process is being used.
No processor immune from power issues
This isn’t just about process nodes, though. High performance computing utilizes a variety of processor types, and all of them have power-related challenges.
“Most servers in the datacenter today are being implemented with Intel x86 processors, with a lot of specialized functionality built out around it,” said Mike Thompson, senior manager for product marketing at Synopsys. “These specialized units tend to be unique because they’re targeted to an application. We’ve had customers that were doing network processors, very high performance backbone kind of processors, in big arrays because of the very parallel tasks. There are other companies doing scientific computing, and there, you typically don’t parallelize the tasks so they’re looking for very high performance. They tend to use arrays, up to 16- or 32-processor kind of arrays. Again, depending on the task, and because the task has to be somewhat parallelizable to take advantage of the different numbers of processors to do symmetric processing, they want deeper pipelines, and maximum performance. There are always teams looking for the maximum performance processor, but it’s a very different approach to obtain that because they’re looking for a deeper pipeline. Usually they want superscalar dual issue. Sometimes they’re looking for a multithread, other times not. Multithreading certainly can help if you have long legacies, but a lot of times they’re looking at ways they can bring the memory as close as possible to the processor and minimize the extent that they have to go outside.”
In these cases, performance is always the main issue, but power is still significant. “Power is not something to ignore,” Thompson said. “There are always concerns about power and the more data that can be kept close to the processor, the better the power equation is going to be.”
At the same time, Swinnen noted that the issue with all this power concern from the high performance community is that power management tends to impact performance by lowering it, and they don’t like that. “The key is how to manage the power and still preserve the performance.”
There are very real ramifications to power/performance management, he said. “For example, a company designed a high-performance chip to operate at 3GHz, but when it came back from silicon it only ran at 2.7GHz,” Swinnen said. “They couldn’t get it running any faster. The reason? An IR drop failure. The power grid sagged too much and caused a timing slow down and they couldn’t get the chip to run at full speed. The most important thing to note here is that this chip had gone through all the traditional sign off methodologies for IR drop, had passed all those tests with flying colors and yet coming back from silicon, it still failed.”
There are a number of other, similar scenarios illustrating it is an industry-wide challenge.
“To account for this,” Swinnen explained, “the tool has to be able to say if the voltage drops a little bit, what is due to the timing. The voltage could be lots of little increments. There are thousands of possible small voltages, and your libraries can only be characterized at one or two or three voltage points. What the tools need to do is interpolation. For a 1 volt library you need to take a characterization at 0.9 as well and 0.8 volts, and then any voltage drop in between 1.0 and 0.8 needs to be interpolated to predict what the timing will be.”
Even processors like GPUs that might not have previously hit the power wall are giving new challenges to engineering teams, Gupta said. “The general perception is that with GPUs you’re operating at nominal voltages around 1 volt, 0.8 volt even at 7nm, whereas mobile devices are clearly at sub 600 millivolts at 7nm. But some of the GPU folks are telling us that when you look at the bigger and bigger devices, they are hitting the power envelope because now you can pack in billions of transistors more at 7nm compared to 10nm. Yet when you take a look at the overall power consumption, just like mobile devices were hitting a power envelope even a generation ago, now GPU devices are starting to hit a power envelope.”
This has a significant impact on GPU design.
“In the mobile world, we have seen that operating at the nominal voltage has been decreasing because you have to function within a power envelope, but now even GPUs would be likely forced to go that way. And if GPUs start to go sub-800 millivolts toward let’s say 600 millivolts, then even GPUs would start to look at voltage impact on timing, process variation, all of the challenges that the mobile sector has,” Gupta said.
The same is true for high-performance processors, including those targeted at AI applications, Cadence’s Zhao said. “Any high performance processors, like AI chips, and in the past couple of years the bitcoin mining chips — those are all hitting the power envelope because the calculations and activities they’re performing are so fast. And there are so many of them happening all at the same time that they consume a tremendous amount of power, which in turn increases the temperature on the die and the need for thermal analysis.”
Conclusion
Power is a major headache at advanced nodes, and it is a particularly thorny issue in the HPC world. But there are no simple solutions to this problem, and there are no silver bullets anywhere on the horizon.
“There’s a tendency sometimes among design managers to say, ‘Oh, this technique only saves me 2% and that one only saves me 5%, and that one only saves me 3.5%.’ You go down the list and none of them save 50% or 60% of your power. You come to the end of the list, then nothing seems to be worth doing yet those are the only things you can do. At every stage, pay careful attention to the power and do the appropriate low power design techniques every step of the way so by the time you come to the end, you have a low-power chip. It’s no single activity that caused that. It was all of them together.”
Related Stories
Cloud Drives Changes In Network Chip Architectures
New data flow, higher switch density and IP integration create issues across the design flow.
Data Center Power Poised To Rise
Shift to cloud model has kept power consumption in check, but that benefit may have run its course.
Processing Moves To The Edge
Definitions vary by market and by vendor, but an explosion of data requires more processing to be done locally.
Leave a Reply