中文 English

How To Compare Chips

Traditional metrics no longer work in the context of domain-specific designs and rising complexity.

popularity

Traditional metrics for semiconductors are becoming much less meaningful in the most advanced designs. The number of transistors packed into a square centimeter only matters if they can be utilized, and performance per watt is irrelevant if sufficient power cannot be delivered to all of the transistors.

The consensus across the chip industry is that the cost per transistor is rising at each new process node, but there are so many variables to consider that no one can say for certain by how much, or even whether that is true in all cases. Direct comparisons are nearly impossible as designs become increasingly customized for specific domains. And while transistor density continues to increase, it’s no longer doubling at each new node. Even in massively parallel designs, where there is a high percentage of redundancy, at least some of the real estate gained by shrinking features is used for fatter wires to prevent overheating in critical data paths, for controller logic, or for some specialized functionality that may apply only for a single application or a specific use case.

“It all comes down to custom workloads and custom silicon and the way we are going to design that and validate it for a particular application,” said Rahul Goyal, vice president and general manager of product and design ecosystem enablement at Intel. “It’s a more application-specific kind of model because it’s just too expensive to be everything to everybody, and to have a perfect, fully validated chip for every single application. You really have to go back to the use-case paradigm.”

Fig. 1: The rise of domain-specific architectures. Source: AMD/Hot Chips 34

Fig. 1: The rise of domain-specific architectures. Source: AMD/Hot Chips 34

Every design has unique constraints, and more advanced designs typically have more of them. In a 5nm or 3nm chip, for example, it’s difficult just to deliver power to billions of tightly packed transistors. And depending upon the architecture and layout, thermal density may be too high to use them all simultaneously. But they can be dynamically turned on and off as needed, an approach that can prevent overheating and extend the life expectancy of the chip.

Alternatively, power can be routed on the backside of a chip to alleviate some of the congestion. That adds manufacturing and packaging complexity, as well as cost. But choosing the best approach depends upon the application, and traditional metrics don’t help.

“Backside power delivery has been worked on for quite some time,” said Kevin Zhang, vice president of business development at TSMC. “The question still is the tradeoff between complexity and benefits. We think 2nm may be the right spot to integrate. You have to somehow flip it over to process the other side, and sometimes you have to thin down the wafer to make a connection from the other side. So there are mechanical challenges, and there also are thermal challenges.”

Put simply, chip design is becoming a complex series of tradeoffs and experimentation, and one size no longer fits all. It requires more planning and experimentation earlier in the design flow, more simulation, emulation and prototyping during the verification and debug stages, and more time spent on various processes — test, metrology, inspection, etch, and deposition — during manufacturing, often using multiple insertion points for the same (or same type of) equipment. Even migrating the exact same design to the next process node requires significantly more engineering at every level, and more process steps. And depending upon the life expectancy of chips in different markets, costs need to be viewed in the context of a system over time, rather than using static formulas based on the number of transistors, performance per watt, or even time to sufficient yield.

Power consumption, delivery, and the resulting thermal effects are pervasive concerns, and they impact every step of the design-through-manufacturing flow, from floor-planning to materials.

“Things get harder when you must supply power into the lower back-end layers, where you have exceptionally fine lines and you must transition from a nicely distributed uniform power grid down to extremely specific parts of the circuit,” said David Fried, vice president of computational products at Lam Research. “This is where we are starting to see a lot of challenges. Due to what we’ve learned over the last 20 years about electromigration, stress migration, and TDDB (time-dependent dielectric breakdown) in copper back-ends, we have created relatively thick liners in M0 and M1 so that copper can be successfully used for power distribution. At the end of the day, as your lines scale down, you end up with more liner and less copper in these lower back-end lines. The lines are now mostly liner, and these liners have much higher resistance. We’re starting to see the introduction of liner-less approaches, comprised of either super-thin liner or thin barriers using different metals.”

Chipmakers have seen this shift coming for more than a decade. Intel obtained a patent in 2013 for cobalt interconnects and its fabrication methods. Since then, cobalt has been used for everything from contacts and interconnects to trench liners, and more experimentation is underway at foundries and in universities to help deal with the heat associated with increased dynamic power density and static current leakage.

“Interconnects are becoming more and more important,” said TSMC’s Zhang. “There are innovative approaches, including new materials. If you think about a copper line, most of the resistance actually comes from the barrier layer. New materials that can lower the barrier layer resistance are very, very important. Things like low-k materials and air gaps are actively being explored by our R&D team to further reduce the parasitic effects.”

New materials are introduced sparingly in manufacturing because they need to be deployed consistently and proven in high-volume manufacturing, often in conjunction with other processes. Process engineers still cringe at the difficulties they encountered back in 2000 when they replaced aluminum interconnects with copper at 130nm. There must be very good reasons for making these kinds of changes, and exploration is a continuous process.

“Cobalt has a higher bulk resistance than copper, but because you can use much thinner liners, you can place more cobalt into the plug or into the line,” Fried explained. “So even though cobalt has a higher bulk resistance, the fact that you can get more of it into the line reduces the line or plug resistance in aggregate. You are going to see some new metals in use, such as molybdenum, which is starting to get used more frequently. Unfortunately, it’s not as simple as saying that we’re going to replace copper with some other metal. There will be specific insertion points on the chip where the cost of that material —and the integration of that material — are justified in terms of circuit benefit.”

Different companies, different concerns
Those justifications are becoming more narrowly defined. At the high-end of the performance scale, the largest data centers are run by companies such as Google, Amazon, Meta, Baidu, and Alibaba, all of which now design their own processors to handle internally developed algorithms. And in the PC and smart phone markets, Apple has designed processors that are tightly integrated with software, and which greatly extend battery life over previous off-the-shelf chip designs. It’s not unusual for a MacBook battery to last 20 hours or more between charges, versus 5 hours in the past.

But these metrics are unique to each company, and the cost needed to design and test these complex chips is no longer viewed in isolation. Processors are now considered strategic parts of much larger systems, and they may include a variety of components, from CPUs and GPUs to NPUs. Not all of those need to be developed at 5nm or 3nm, and not all of them need to be used all the time or for critical functions.

Fig. 2: Different metrics for effective compute over time. Source: Tesla/Hot Chips 34

Fig. 2: Different metrics for effective compute over time. Source: Tesla/Hot Chips 34

Still, they all need to work as expected, and traditionally that was measured in yield. But there are ways to maintain yield without producing perfect chips. There may be enough redundancy to offset errors, or enough resiliency to allow it to function within spec. So what traditionally might have been considered a bad chip may still be good enough.

“Nothing is perfect,” said Eric Beyne, senior fellow and director of imec’s 3D System Integration program. “There is a certain level of failures that can pass certain tests, which is not necessarily dramatic because you will catch them later in functional tests. So there are ‘good enough’ tests. And there can be redundancy, like bus interfaces, where they can have redundant lines for error coding. This comes at a cost of latency and complexity, of course. You can engineer your interface to be fault tolerant, but it will cost you, to some extent. And that’s the big tradeoff here. It’s either cost or everything works as perfectly as you want it to.”

That doesn’t mean chips that are not suitable for one application cannot be used somewhere else, too. “Certain markets will require a different threshold of compatibility,” said Mike McIntyre, director of software product management at Onto Innovation. “People have been building memory cubes for years, and that memory cube has a certain performance threshold. But that performance threshold is set by the lowest chip performance in that stack. So if you have all high-speed memory in that stack, it’s going to be a high-speed equivalent stack of chips. But if you put a low-speed memory chip in there, the whole stack is limited by the performance of that one chip. And that happens at a system level, as well. Do you have a good quality chip going into that system that’s going into a high-performance market? Or do you have a lesser-known quality of chip that can put into a general marketplace? So it may be servers versus laptops versus some other utility computing system.”

More options, maybe too many
The key questions are where and how will a chip be used.

“Certain technologies are good for certain solutions or certain problems,” said imec’s Beyne. “It’s not that they are going to be around for everything. For things like fan-in, fan-out, and system-in-package, there is a whole set of technologies that will be useful. It really depends on what you want to solve. If you think about the RF modules in a phone, those so-called chips may be collections of 50 different components in one package, but these are components with, relatively speaking, few connections to make. So the interconnect density is low. You cannot do the same thing for AI memory logic partitioning, which is very different.”

What’s becoming obvious, though, is that most of the activity in the chip industry isn’t happening at the leading-edge nodes, where metrics provided bragging rights about the number of transistors or the automatic power, performance and area/cost benefits. In an ironic twist, most of the concern about metrics is happening at more mature nodes, particularly with chiplets and advanced packaging, and chips that may be qualified for applications such as automotive.

On the packaging front, there are so many possible combinations that metrics turn into distributions and probabilities rather than fixed numbers. “Advanced packaging is not only flexibility in how to turn things on and off to make them fit together, but also designing different ways to make things fit together,” said Kim Arnold, chief development officer at Brewer Science. “There’s so much that’s going to change in our space. The question now is which of all the possible avenues are going to be winners, and which ones are going to be niches.”

It’s difficult to determine that today because there is so much activity across the board. The build-out of the edge, and all the devices that will leverage edge computing — cars, industrial equipment, IoT devices, smart phones — is creating enough work for everyone, from the leading edge to well-established nodes. This was evident during UMC’s Q2 earning call with analysts. “We believe that 28 and 22 [nm] will be long-lasting nodes and supported by a very diversified base of a product portfolio,” said Qi Don Liu, UMC’s CFO. “In the next few years, we expect the 28 and 22 demand will remain robust, driven by applications like Wi-Fi 6, 6E, networking in the GPON (gigabit passive optical network) area, and OLED driver applications.”

So while UMC still plans to add finFETs to its product offerings, it’s not an immediate priority. “We will continue making progress on the finFET, but on the capacity deployment point of view it does have a low priority compared with the other nodes at this moment,” said Jason Wang, UMC’s president. “We’re still putting 14 on the roadmap, but the significant capacity deployment plan is not in the near-term yet.”

That approach is echoed at GlobalFoundries, which focuses on unique implementations at mature nodes, rather than slugging it out at the most advanced nodes where there are far fewer designs. “Design kits, in particular, are areas of differentiation for us,” said Gregg Bartlett, senior vice president of technology, engineering, and quality at GlobalFoundries. “So even if our competitors have the exact same transistor performance capabilities, we get better products with PDKs because we have integrated features with the EDA companies, or we have simulated the silicon with elements that make better products for us. As a silicon or materials person, I would always want to go differentiate the technology based on transistor performance, better drive currents, lower leakage, higher temperature compatibility. But increasingly, it’s about the design context or design intent. We have a very large PDK effort dedicated to making sure that the EDA tools our customers want are capable of informing their designs.”

And finally, there is the ability to mix and match nearly everything using chiplets. That means a small logic element can be created at 3nm or even smaller, and integrated using some off-the-shelf or customized interconnect scheme with a 180nm chiplet in the same package. The advantage here is a third dimension. That can be used to reduce various types of noise, improve heat dissipation, and boost yield, which typically increases as the physical size of chips is reduced. That even allows for higher density in some of the components, which was untenable in the past because of the limitations of mask lithography.

Curvilinear ILT is able to achieve way better process windows than conventional OPC, which is restricted to Manhattan (45°) shapes,” said Aki Fujimura, CEO of D2S. “Mask shapes used to be restricted, practically speaking to Manhattan shapes because masks were written with VSB (variable-shape e-beam) writers. It’s getting harder and harder every technology node, even with EUV, to get the wafer shapes to be as uniform as possible across manufacturing variation. It’s been well established for about two decades now that the best uniformity is achieved by using curvilinear shapes on mask.”

But which metrics apply to this kind of approach?

Conclusion
While chipmakers and systems companies still have to justify their metrics, the real value is much more complicated and domain specific. The speed of an I/O may not matter to a sensor on a tractor, but it may be critical for a chip in an automobile connecting with infrastructure or a nearby car. Likewise, processing speeds may be less relevant in a chip used for streaming video inside a mobile phone, but they are crucial for detecting the course of a hypersonic missile.

This raises questions about how consumers will differentiate between devices in the future, and it opens the door to a slew of possible options about how systems companies can put various pieces together. But at least for the short term, there is likely to be a lot more confusion. The metrics that consistently have defined chip architectures over the past 50 years are becoming far less relevant, and the ones that really do matter may be too complex to explain.

Related Reading
Big Changes In Architectures, Transistors, Materials
Who’s doing what in next-gen chips, and when they expect to do it.
Scaling, Advanced Packaging, Or Both
Number of options is growing, but so is the list of tradeoffs.



Leave a Reply


(Note: This name will be displayed publicly)