Mind-boggling number of options emerge, but which is best often isn’t clear.
The guideposts for designing chips are disappearing or becoming less relevant. While engineers today have many more options for customizing a design, they have little direction about what works best for specific applications or what the return on investment will be for those efforts.
For chip architects, this is proving to be an embarrassment of riches. However, that design freedom comes with huge financial risk and heightened responsibility over longer chip lifetimes. Up to and including the 28nm node, these kinds of decisions were defined by the ITRS roadmap and Moore’s Law. But as the power and performance benefits of scaling began diminishing, and the cost to design and manufacture three-dimensional transistors started trending upward, semiconductor economics began looking very different.
Chipmakers now can utilize a variety of heterogeneous options, creating more customized designs to suit a particular data type or end application. This is evident with large data centers, where giants such as Google, Amazon, Alibaba, and Facebook have been designing their own chips, as well as in markets such as automotive, where OEMs such as Tesla and Volkswagen are racing toward increasingly autonomous vehicle architectures using internally designed silicon. But the resources required, and the risk of failure, also are increasing.
All of these factors have broad implications for the chip industry, as well as for systems companies that utilize these designs. Among them:
Put simply, the chip industry is disaggregating and re-aggregating in new ways. While that is provides enormous design freedom for chipmakers, it also changes business and technology relationships in ways that are not obvious. Those changes, meanwhile, are up-ending a whole slew of assumptions that could be charted in a nearly straight line for the past few decades, when the primary goals were smaller, faster, lower power and cheaper. And to make matters even more confusing, geopolitical strains are fostering a separate supply chain in the case of China, and the number of choices and unknowns is rising significantly.
“It isn’t about whether we can make smaller and smaller transistors,” said Simon Segars, CEO of Arm. “We’ve got a simultaneous explosion in edge AI, tiny sensors that need to leverage one set of technologies. We’ve got massive compute that’s going on in the cloud where you need to worry about the efficiency of that. We’ve got new network technologies evolving that require new wireless and RF technologies. It’s going in multiple directions at once, which makes it difficult to use guideposts.”
The result is a level of uncertainty, as well as creativity, not seen since the early days of computing.
“You’re trying to get to the best possible product for the customer at a given time,” said Ann Kelleher, senior vice president and general manager of technology development at Intel. “But you have many more options on the menu, and it’s more of an à la carte menu than a fixed menu. In the past, everything was based on the node that you were working with. I go back to the design enablement team, and the design efforts between the process and packaging, and there is a lot of active discussion and debate in terms of how we achieve the best possible answer for given products going forward.”
Kelleher noted this includes a number of factors, such as process, different tile options within a package, cost, and other market-specific factors. “There are many ways to get there, and the supply chain itself has become much more complicated,” she said. “Depending on the particular product and its particular features, it becomes a discussion of how do we get there with the most manufacturable version of tiles as well as supply chain.”
The number of options has exploded, but the guidance for how best to utilize those options is disappearing. “Over the last 10 years, you used to be able to say, ‘My software is changing, so I can use the next-generation processor,'” said Simon Davidmann, CEO of Imperas. “The problem today is there isn’t a next generation of a standard processor that is applicable to all of these different software problems.”
It now requires more resources — tools, manpower, money, and time — to develop leading-edge chips. On the power side, challenges range from delivering enough current to transistors, and with cooling those transistors when they are fully utilized. There are questions about how memory and logic should be partitioned, and who should be doing that partitioning. There are more potential interactions and physical effects, such as power, electromagnetic interference, and substrate noise, as well as increasing quantum effects to contend with at the most advanced nodes. And there are challenges to make sure everything is sufficiently verified and tested, not just in the fab, but also in the field and over time.
Designed by whom and for what?
All of this has a big impact on a design, how long a chip or IP block is expected to function according to spec, and the price tag for the chip, package, and system. In the past, this was defined by the OEM, and the chipmaker would build a chip for a spec based upon a limited number of options, such as process node, power budget, and cost. Those typically defined performance, power and area (PPA).
Today, that buffer is largely gone. Chipmakers are working directly with systems companies to build chips, or the systems companies are doing it themselves. Initial planning involves a multi-disciplinary engineering team, and possibly one or more IP vendors, EDA companies, and in the case of large systems companies and IDMs like Apple, a foundry.
Concerns about any which portion of the PPA equation to focus on can vary greatly from one application to the next. For example, the cost of designing a chip or module in a context of a hyperscale data center may be far less important than the savings from an energy-efficient design, which can be offset by the high cost of powering and cooling racks of servers, as well as the higher number of customers that can be served within a given footprint by a better-performing system. It also can be offset with better monitoring technology to determine when a chip needs to be replaced, rather than a wholesale turnover of equipment every four to seven years to avoid downtime.
“It’s about really taking a look at what the end user application is,” said Joseph Sawicki, executive vice president of Siemens IC EDA. “That end user application may go beyond just simple data processing. It may involve being interfaced to the outside world, and it’s changing both design and validation such that it has to span out and increasingly handle those aspects of validating an end user software stack operating in the real world, which is way more data processing on the design side of things, way more invested in end user experience, and far more holistic about how you optimize for design.”
On the other hand, if it is a sensor fusion module within an automobile, the cost of the design is a an overriding concern. But even that may be less important than the ability of the module to work flawlessly with other components in a vehicle throughout its expected lifetime.
“If we think about scaling in in the past, it was always about going to a lower process node with smaller transistors and having bigger SoCs,” said Hany Elhak, group director for product management and marketing at Synopsys. “Now, scaling is becoming different chips designed in different technology and customized for different applications, and it’s all part of a bigger system. The scaling is manifesting itself in different ways now. So it’s not just Moore’s Law. It’s a system of systems.”
Fig. 1: IC design hyper-convergence. Source: Synopsys
Those systems of systems can change, too. Some of the most profound technological changes are happening inside of traditionally low-tech industries, fueled by better connectivity and the use machine learning to improve profitability.
“There’s a convergence of different technologies happening, and it will continue to happen,” said Louie De Luna, director of marketing at Aldec. “There’s 5G, AI, machine learning. You even see this in the home market. With a smart TV, you now can search on YouTube.”
The lines between what previously were very different markets either have blurred, or are starting to blur, and semiconductor design will follow or drive these changes. “A lot of the conversations we have with our avionics customers are around FPGAs,” said De Luna. “FPGAs can be used to control engines, electronics, takeoff, navigation, and things like that. What we’re seeing now is an increasing use of high-speed interfaces like PCIe and Ethernet. But to deal with DO-254 when we’re using these high-speed interfaces is very difficult. You need to capture the results for the serial high-speed interfaces, and there’s no way to do that. And when you debug it and you’re looking at waveforms, there are a lot of deterministic results, so it’s hard to debug.”
Varying rates of change
Churn within the chip industry, and within markets it serves, has made it harder to develop standardized IP and chips. Large IP companies have been seeing this trend for some time because their largest customers have been demanding tweaks to commercial IP.
The impact of all of this activity has cast a spotlight on another industry guidepost, Makimoto’s Wave, which assumes a 10-year cycle between customized designs and standard products. While the basic idea is still sound, the time frame from peak to trough is lengthening due to very different economics coupled with brand new applications.
“Makimoto was talking about a much shorter wave,” said Walden Rhines, president and CEO of Cornami. “Today, it’s looking more like 30 or 40 years. This is last time I remember anything of this magnitude was when the mini computer industry built their own wafer fabs in the 1980s.”
This is no longer as simple as replacing custom-built with standardized parts, and it raises questions about just how well derivative chips will work in this scheme. That shift becomes especially apparent when AI/ML is added into the design process.
“In the past, we had the replay, and then the reuse of a model,” said Kam Kittrell, senior group director for digital and signoff marketing at Cadence. “Today, it’s hard to tell if a model can be re-used. If you take the same library, such as a shader core for a GPU, now you’ve got different operating voltages, so the training you did before doesn’t even pertain to this.”
Fig. 2: Rising transistor costs. Source: Cadence
Aging adds yet another variable. Aging can vary, depending upon the size of wires (RC delay), the thickness of dielectrics, and how intensively various parts of a chip are used throughout their lifetime. Low utilization of a circuit, even at the advanced node, can significantly increase its life expectancy, while higher use of circuits developed at even older nodes can shorten their lifetime.
“Aging has been predicted on an ad hoc approach in the past,” said Kittrell. “It was like, ‘I think it’s going to age about this much.’ Automotive customers were using this because they were the ones that had the requirement for reliability. A circuit had to work for 20 years. Now, the hyperscaler people are concerned about aging because there can be pretty significant loss of performance in one year with high activity on an advanced node. They’ve got to make sure that if it runs at 4 GHz, that it will stay in the 4 GHz range, and they do this through robust optimization.”
The introduction and growing adoption of RISC-V adds yet another variable. Rather than working exclusively with a commercial core, the open-source model allows users to customize the instruction-set architecture source code, as long as it can be certified by RISC-V International. RISC-V allows for much tighter integration between software and hardware, specifically targeting features that are essential for a particular use case or application. It also creates a new set of challenges for chip design teams, along with the potential for shifting the PPA equation. A customized RISC-V accelerator, for example, could be packaged with an off-the-shelf processor, to create a domain-specific device more quickly and inexpensively than designing an entire module.
“You can use RISC-V to work with signal processing or imaging processing by adding something new in terms of instruction set extensions, and RISC-V [International] actually encourages you do do so because it already tells you how to do the customization,” said Zdenek Prikryl, CTO of Codasip. “But if you design an accelerator and you put it into a bigger system, you may have tens or hundreds of different accelerators. You have to be sure everything works together. You have to put a lot into the verification.”
Smarter tools, and smarter use of tools
Rapid acceleration of technology, coupled with widespread demand for domain-specific solutions, has created a potential bonanza for the EDA industry. Revenue over the past two years has been firmly in the double digits, and more recently it has set some records for revenue growth.
But there are so many variables involved in new designs that EDA companies are scrambling to keep pace. In some cases, the tools need to be modified for each new project.
“In the hardware world, very bright people are coming up with new architectures and new ideas to solve things, and they are stretching design tools with crazy ideas and making us rethink how we do things and what we do,” said Imperas’ Davidmann. “We need to re-factor our simulator about every year to make it do something better. Someone throws a new problem at us and we go, ‘Okay, how can we do that?’ And we try to tackle it. Sometimes we’re successful, sometimes we can’t help, but this expansion of the designs into all the new electronic products we’re seeing and that are being developed is phenomenal. It’s an exciting opportunity for for electronic design space.”
It’s also difficult. “We are trying to address two problems,” said Synopsys’ Elhak. “The traditional one is circuits are now bigger and more complex, running at higher frequencies, and they have more parasitics. This is this is the scale problem, and we are trying to deal with it by offering faster simulation and higher-capacity simulation. This is the known problem. The other problem, which we are trying to solve, is that now I have many different types of circuits that are part of that bigger system, and they need to be designed together. We need to have some common flow for these different design teams so we don’t end up with problems at the end of the design cycle when they try to connect these things together. They need to be working together from the beginning.”
Standards can be extremely helpful in this regard. While the chips themselves are becoming less standardized, the data formats for various processes and tools are trending in the opposite direction. That helps with things like characterization of IP and the interconnects between different chips and systems.
“This is this is one of the important points,” said Roland Jancke, department head for design methodology at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “You need to standardize data formats so that you’re able to exchange information between simulators, and you need a generic interface for analyzing data formats. We are currently working with partners on the standardization of mission profile formats. Because there are different levels and different suppliers in the value chain, you have to look at what you are designing the system for or testing it with. What are the mission profiles that you used to design the system, to test the system, to verify the system? What are the different vectors that have been used at the system level for development at the circuit level, and so forth? There are questions from companies using test equipment about whether we can somehow standardized mission profile formats to be able to describe those test vectors, simulation vectors, and source vectors is in the same format.”
The learning curve
While all of this seems someone chaotic, there is one over-arching constant – the learning curve. When plotted over the past nearly seven decades, it has been a straight line, showing the cost per transistor will continue to drop, but not necessarily when viewed from the same vantage point of scaling.
“Moore’s Law was just a special case, where you did all your cost reduction through feature size shrinking and wafer diameter growth,” said Rhines. “But the learning curve looks as predictable as ever. If you put 512 layers in a NAND flash package, and you save enormously on the package compared to doing 512 packages, that reduces the cost per transistor. The learning curve doesn’t care how you get there, so long as you hit the cost per transistor. If you do it with packaging, that’s okay, even if it introduces other work, like thermal analysis when you start stacking memory on logic, or even memory on memory. And if you do it by shrinking transistors, that’s okay, too.”
There are many ways to achieve the same goal in complex designs, and there are many more than in the past. But without the guideposts that defined the industry’s agreed-upon best practices, the challenge is getting at least one of them to work as expected.
Related
Sweeping Changes Ahead For Systems Design
Demand for faster processing with increasingly diverse applications is prompting very different compute models.
Steep Spike For Chip Complexity And Unknowns
Increased interactions and customization drive up risk of re-spins or failures.
Shifting Toward Data-Driven Chip Architectures
Rethinking how to improve performance and lower power in semiconductors.
The rising cost per transistor plot (minimum at 28nm) isn’t correct, even at 5nm this is still lower than 7nm — *if* your volumes are high enough that die cost is what matters, not the exponentially rising NRE costs.
These are now so high that the number of applications that can afford to use the most advanced processes is dropping every node — even if you’d like to use them to save power or cost, the bill is simply too high 🙁
Market-specific solutions and high customization make advanced node designs increasingly difficult to justify for a single SoC because they cannot achieve those economies of scale, but we may see the economics change rather significantly if they can be added into a package with other chips/tiles/chiplets developed at different nodes.