Disaggregation and the wind-down of Moore’s Law have changed everything.
Chipmakers are beginning to re-examine how much dark silicon should be used in a heterogenous system, where it works best, and what alternatives are available — a direct result of a slowdown in Moore’s Law scaling and the increasing disaggregation of SoCs.
The concept of dark silicon has been around for a couple decades, but it really began taking off with the introduction of the Internet of Things, when everything had to fit on a single die and work off a small battery. That proved problematic for the initial versions of smart watches and phones, and the best solution at the time was to shut down every circuit that wasn’t needed for an essential application.
Other issues followed. For example, the inrush current when devices are powered back up — particularly those that needed to turn on more quickly — can stress circuits to the point where they would damage chips. So while powering down some parts of a chip can reduce aging — turning them on quickly also can be problematic. Much of this has been ironed out over the past decade through engineering of low-power chips, and dark silicon coupled with extremely efficient design have enabled an entire generation of useful mobile devices, while also putting a significant dent in data center energy bills.
The question now is what to do next to lengthen the time between charges, and just adding more dark silicon doesn’t solve that issue. Instead, there is more focus on designing silicon to fit the application, and this is being enabled by a series of developments across the design-through-manufacturing flow. Among them:
All of these changes are additive. So instead of powering down large portions of a chip, more can be done with smaller chips or chiplets, which can be much more cost- and power-efficient. In addition, various functions in a chip can be developed at the optimum process node, weighing such factors as cost, use cases, static current leakage, and size.
“There’s more of a variety of approaches,” said Rob Aitken, R&D fellow at Arm. “Part of the idea behind dark silicon was that there was a fixed power budget, especially for mobile computing. But if you shrink the device, and at the same time push the frequency, then power doesn’t really improve. Instead, you wind up with this empty space, and there were various ideas about what to do about that.”
For smart phones and wearables, dark silicon is a proven solution, but it’s not the most efficient one. There are other options available, from limiting the size of various components to spreading them out in a package, which reduces thermal effects at increased density. This is particularly valuable for memory, which runs more efficiently at cooler temperatures. So memory may stay cool when nearby circuitry is powered down, but inrush current can quickly overheat it. A better option is to physically separate the memory from active logic in an advanced package.
“If the temperature gets too close to the maximum allowable operating range, you may have to refresh the memory more often,” said Steven Woo, fellow and distinguished inventor at Rambus. “You end up losing performance when it gets hot. And if it’s getting warm, you might have to do what’s called throttling. Maybe you don’t run it at its peak performance for an extended period of time, or maybe you have to run it in short bursts and let it cool down again.”
All of these techniques, and other developments have allowed mobile devices to do much more intensive computing than in the past without burning up. “In the mobile space, the power actually went up,” said Arm’s Aitken. “Chips draw more power today than they did 15 years ago. Some of that is made possible because of better battery technology. But some of that is more physical area that lets you disperse heat more effectively.”
By isolating various parts of a chip through three-dimensional floor planning and employing various techniques such as dynamic voltage frequency scaling, along with some dark silicon, that heat can be managed much more effectively. It also can be done using less silicon area, which improves performance and opens the door to additional functions and features in the same devices.
There are other benefits to this approach. “There were reasons in the past to go for larger and larger chips, so you could have more functionality integrated into a single chip,” said Roland Jancke, design methodology head in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “If you don’t need certain parts of the silicon, then you can switch them off to save power. But there are other reasons to use smaller chips. If you include analog in a design, for example, that’s typically in an older process node than digital, so it takes up more area [which makes it more advantageous to reduce the size of digital components]. And in the case of RF, it requires a lot of power. There’s a security advantage with chiplets, as well. It’s harder to copy the overall system functionality. It only works if you integrate the same pieces in the same manner into the same package. If you fail with any one of these chips, then you miss the overall functionality.”
AI’s impact
One of the big drivers for rethinking what gets turned off, and for how long, involves AI and machine learning, where chips are designed for maximum performance and throughput. This is especially true for large data centers, where immense amounts of data need to be processed very quickly. Typically, this involves powerful processor cores working in parallel, some of which are designed specifically for the workloads in those data centers, often in combination with GPUs, CPUs, some type of NPU, and DSPs. The problem is these devices depend on a steady stream of data, and that stream isn’t always flowing consistently.
“If there are two solutions, and one uses the transistors more efficiently, it will get more throughput per dollar and per watt,” said Geoff Tate, CEO of Flex Logix. “So having dark silicon isn’t desirable from the customer point of view. It’s hard to develop architectures that get high utilization, but the more utilization you get, the better. Transistors still aren’t free — especially with the supply shortage.”
Tate noted that the perception of what’s best has changed over the past half-decade as AI became more pervasive. “In the early stages of AI, the first challenge was just to get something to work and to improve the models — to make them better and better and come up the learning curve,” he said. “And in the data center, they have huge budgets and huge profits, which enables them to do certain things they couldn’t do before. But as we look to deploy AI into high-volume, more price-sensitive solutions, customers are going to be looking for who can provide the most inference performance for their power budget and their money budget, and most people that we see hit the power budget before they hit their money budget. It’s not just about using transistors efficiently for cost reasons. The more transistors you have, the more leakage you have. So if you can get the work done with fewer transistors, it’s going to be more power-efficient.”
In the AI world, “dark silicon” can take on another connotation, as well. “Although vendors are trying to give you all the silicon and all the horsepower, when you try to run the actual neural network model, you’re not even getting anywhere close to 40% of the system,” Nick Ni, senior director of data center AI and compute markets at AMD. “The engines can be very fast, but if you don’t have the data to process, then they’re just sitting idle. That’s what’s causing the dark silicon.”
Fig. 1: AMD’s 3D V-Cache using cache chiplets stacked on a processor. Source: AMD
The challenge is to thoroughly understand the context and the amount of data that needs to be processed, and then design chips around those factors. One of the reasons AMD acquired Xilinx, and why Intel acquired Altera, is to be able to fine-tune how some of these devices are used. Programmable logic can be dynamically reconfigured and sized according to needs, so rather than a giant FPGA, small FPGAs can be scattered around a package and utilized as needed. And while a giant FPGA is never as efficient as a hard-wired ASIC, smaller programmable logic chips can be used to shrink the amount of under-utilized or non-utilized silicon.
“While it would be nice to be able to build a custom ASIC for each market, some of those use cases are so diverse that the markets get smaller, while the cost of building an ASIC is going up,” said Rambus’ Woo. “So an FPGA in conjunction with an x86 makes sense. You load in your bit file for your market-specific work, and then you leverage the common infrastructure of x86 to do everything else.”
The chiplet factor
Chiplets add another level of flexibility, because die sizes can be adjusted to whatever is required for a particular function. That means unused parts of a die can be eliminated entirely, rather than put to sleep, and added functionality can be put on a different chiplet.
“Die size is one of the main drivers for the adoption of chiplets,” said Jan Vardaman, president of TechSearch, in a recent presentation. “Die sizes are huge today for GPUs and CPUs, and we do have to have more transistors. It’s just that we’ve got to figure out how to economically put all those transistors together and function. So the additional work we’re doing in driving the adoption of chiplets is going to allow us to do a finer package of higher density. You can do things that improve the power efficiency, which is very important in a lot of our applications.”
The key is being able to put the pieces together in the most efficient way. “We’ve got to be able to think of design in a new way. It’s a system architecture,” Vardaman said. “Because you’re getting a smaller die, which gives you yield improvement, you’re going to use the most advanced nodes for the parts that need those nodes. You’re not going to fabricate the analog part of the die in a high-performance logic node. You’re going to fab that in other nodes because it’s less expensive. You’re going to put all of that together. The chiplet is the hard IP block. It has to be co-optimized. All of this stuff works together. You can’t design these things in isolation.”
In this scenario, dark silicon becomes just one more option for reducing power, and not necessarily the best one. And while it may provide compute power in reserve for specific functions, it’s not the most efficient way to design a complex system.
Conclusion
The ability to pack more functionality onto a die through feature shrinks continues, but the power and performance benefits are shrinking, as well. As a result, chipmakers are looking to advanced packaging for those benefits, and dark silicon is less attractive in a package than it is for a single, high-performance die, where one size fits all and volumes are in the billion-chip range. Even under the most ideal conditions, dark silicon appears to be showing its age.
“We’re in this dark silicon trajectory,” said Aitken. “There are a bunch of things that went into the dark silicon thought process that have become mainstream, and everybody accepts that. You’ll build a chip that has a bunch of different core capabilities, and some of them will be on at various times and some of them warm. And that’s all fine. But how you turn on everything all the time to get to absolutely maximize the compute performance is still a very hard problem. And it’s one that you might not want to answer, because it will generate an enormous amount of heat that you can’t deal with, anyway.”
Leave a Reply