Number of designs that are late increases. Rapidly rising complexity is the leading cause, but tools, training, and workflows need to improve.
First-time silicon success is falling sharply due to rising complexity, the need for more iterations as chipmakers shift from monolithic chips to multi-die assemblies, and an increasing amount of customization that makes design and verification more time-consuming.
Details from a new functional verification survey[1] highlight the growing difficulty of developing advanced chips that are both functional and reliable. In many cases these devices are bigger than reticle-sized SoCs, and they contain a mix of components and wiring schemes that can render them less stable. As such, they require more optimization and more iterations.
These devices pack in more logic, some of which is developed at different process nodes. They have more SRAM and more interconnects, neither of which scale at the same rate as logic. In addition, they require more software, which in the case of LLMs and other AI algorithms is evolving much more quickly than the hardware. They also require sophisticated cooling techniques, new materials, highly engineered and customized packages, and more accurate workload-dependent models. And to make matters worse, they require more verification than schedules allow.
The collective impact of all of this is borne out in the latest numbers. “The industry has hit the lowest point ever in achieving first-silicon success,” said Harry Foster, chief verification scientist at Siemens EDA. “Historically, it’s been around 30%. Two years ago it dropped to 24% from 2023 to 2024. It dropped to 14% this time around. That’s one data point. Another is that historically we’ve seen about two-thirds of projects were behind schedule. That increased to 75% behind schedule.”
Fig. 1: Number of designs that are functionally correct and manufacturable is declining. Source: Siemens EDA/Wilson Research Group 2024 Functional Verification Study/DVCon
At every step of an increasingly complex and multi-layered flow, productivity needs to increase. Typically, that means hiring more engineers. But with an ongoing talent shortage, and the need for extensive training that extends well beyond what engineers were required to understand in the past, that’s not possible. This is why EDA vendors are so focused on adding AI into their tools, basically codifying that knowledge through reinforcement learning. But that transition still takes time.
“What we’re doing is not working,” Foster said. “We need to significantly increase productivity, and that’s not a metric a lot of people like to talk about because it’s hard to measure. It’s relatively easy to say, ‘This is 10% faster than something else.’ In addition, a lot of these companies lack the skill to do very complex chips, or it’s something new to them. In the late 1990s, everybody was talking about the productivity gap. This is Productivity Gap 2.0. The problems are different than they were 20 years ago, but there are some common themes. We need to move away from siloed sets of tools to something that’s much more connected and integrated.”
Fig. 2: Rising complexity, combined with a productivity gap, is slowing time to manufactured silicon. Source: Siemens EDA/Wilson Research Group/DVCon
Productivity issues cannot be blamed entirely on complexity alone, however. Even mainstream chipmakers are churning out chips faster than in the past.
“I started life as a verification guy, and we all seemed to have this level of paranoia about getting first-time silicon,” said Matt Graham, senior group director for verification software at Cadence. “It seemed like we were on that trajectory for a long time. But over the past year to 18 months, suddenly everybody is doing more chips. Even companies that are not consumer-focused, like those making chips for testers, are now looking for four times the number of chips. They’ve gone from one chip every 18 months to four or five per year. That’s because everything has become more specialized all of a sudden.”
This frequently happens with major technology shifts where processes, tools, and standards need to catch up. “We often go from more specific to more general-purpose and back again, and we seem to be in one of those application-specific cycles,” Graham said. “That’s increasing the number of chips everyone is doing by four or five times, but no one is staffing up to do four times more tape-outs. And if you’re on the cutting edge, with 3D-ICs or chiplet-based designs, one of the wafers may need a little bit of a spin.”
At the leading edge, changes are deep, numerous, and sometimes design-specific, making it difficult to pinpoint problems. Many of those designs are one-offs for internal-only consumption by large systems vendors looking to push the limits of performance for specific applications or data types. In those cases, the cost of a respin is part of the budgeting process, which adds some fuzziness to the numbers.
“For the respins, logic function is still the most prominent issue,” said Frank Schirrmeister, executive director for strategic programs and system solutions Synopsys‘ System Design Group. “The [Siemens EDA/Wilson Research Group] survey showed that 70% of respins are design errors due to changes in the spec. This means that somebody misunderstood the spec and sounded the alarm bell, so 50% do a second round. Some of the large chipmakers actually plan for as many as four respins. So at the end of the day, it’s just about the complexity.”
That also creates a potentially enormous opportunity for EDA companies, particularly those incorporating some type of AI into their tools and flows.
“In generative AI, you have a co-pilot to assist and create,” said Sassine Ghazi, CEO of Synopsys, in his Synopsys User Group keynote. “With the co-pilot technology that we started with Microsoft, you have a workflow assistant, knowledge assistant, and a debug assistant. You can ramp up a junior engineer in a much faster way, as well as an expert engineer. They can interface with our product in a more modernized, effective, efficient way. Then you have the creative element. We have early customer engagement, from RTL generation, testbench generation, test assertions, where you can have a co-pilot that helps you create part of your RTL, test bench documentation, and test assertions.”
In some cases, productivity has improved from days to minutes. But the biggest benefits are yet to come with the rollout of agentic AI, which essentially raises the abstraction level for the entire design and verification flow.
“As AI continues to evolve, so will the workflow,” Ghazi said. “I often get asked the question from our stakeholders about when we will see a change in the EDA market by leveraging AI. I don’t believe that will be the case unless the workflow changes, where you can do certain things very differently in order to deliver your product in a faster, more effective, more efficient way. Now, with the agentic AI era, agent engineers will collaborate with the human engineer in order to tame that complexity and change the workflow.”
Fig. 3: Evolution from generative to agentic AI. Source: Synopsys/SNUG
Abstracting the problem
Some of the biggest challenges engineers face with advanced designs is understanding dependencies across hundreds or thousands of different elements in a design. In the past, one of the biggest knobs to turn was tighter integration of hardware and software. Co-design now can include dozens or even hundreds of chiplets that need to work separately and sometimes in unison. Multi-physics simulations are required to understand all the possible interactions, and instead of just hardware-software co-design, and that co-design now includes various types of interconnects, the package, maybe photonics, and in some cases, a much larger system of systems.
In addition, everything needs to be testable (DFT), manufacturable (DFM) with sufficient yield (DFY), and it needs to have enough internal controls so that it doesn’t overheat. And if it does run too hot and ages faster than expected, there need to be mechanisms to re-route signals, which are primarily software-driven.
“In the verification space what we’re seeing is that software is becoming more and more a part of the complete solution,’ said Cadence’s Graham. “It’s not just, ‘We’re going to build a chip.’ It’s specific chips. The end market, the end use case of the chip, is understood. The software stack that runs on that is known, and the robot or the car or whatever it is that goes in is understood. And there’s a need to shift all of that left and right, where we need to consider software verification, pre-silicon, and maybe even before we get it to an emulator or prototyping platform where we need to ramp up the software.”
How much functionality goes into software versus hardware isn’t always clear at the outset. Fine-tuning that balance is a time-consuming process in complex designs, which easily can lead to multiple respins.
“Software introduces a whole lot of functionality and features,” said Ashish Darbari, CEO of Axiomise. “If the hardware teams are not entirely aware of those — and especially the verification folks — then there is a big gap between what is being tested and what is being defined and scoped. This is exactly why a lot of bugs get missed. We do all this virtual prototyping and bring up software early to get 10,000 or 100,000 simulation vectors. But who’s asking questions around the boundary conditions? Project after project, we go in and pick up all these error case issues in the first two or three weeks because the designers have run out of time.”
New markets, different concerns
These issues reach well beyond just functional verification and debug, which always has consumed the lion’s share of chip development time and resources prior to manufacturing. The inclusion of more complex chips in safety-critical applications, such as automotive and mil/aero, adds a whole new level of requirements for designs. In the past, none of these markets allowed advanced-node chips because they were not considered as reliable. But with growing competition from startups such as BYD and NIO in China, and electric vehicle startups such as Rivian and Lucid in the U.S., established carmakers are scrambling to shift more functionality to software. That can be done only with more advanced chips and highly customized packaging, which will become increasingly necessary as carmakers move toward increasing levels of autonomy.
Safety is a requirement in these systems, but a breakdown in any system also can add security vulnerabilities. As a result, chips need to be designed to address more corner cases, from accelerated aging due to ambient heat in hot climates to real-world road conditions. And while much of this can be simulated, chips also need to be road-tested. If any problems cannot be adequately resolved in software, chips need to be respun.
“Functional verification consumes most of your time,” said Axiomise’s Darbari. “But simple power optimizations, like bringing X’s into the design, could easily make a block vulnerable to Trojans because these X’s are now giving choices in an execution framework. So an X in silicon is a zero or a one. You don’t actually see an X, but from a simulation and behavioral point of view, these X’s are now adding synthesis choices to the end user to be able to exercise areas of the design where you shouldn’t have access. So on one hand, you have functional verification. On the other hand, you have these X issues being introduced from a power point of view, and then you have a redundant area. In security, the more area you have in silicon, the more you’re exposing yourself.”
Putting the pieces together
That extra silicon area is needed for more processing elements and more functions within a chip, or collections of chiplets in some type of advanced package. But it also makes first-time silicon much more difficult to achieve.
“You’re dealing with accelerators that have very complex workloads,” said Siemens’ Foster. “That introduces a lot of non-determinism into the design that we don’t even know how to describe semantically, so it becomes very difficult to verify. One of the challenges is that we’ve built a lot of flows that are tool-centric, without accounting for the feedback loops that are needed to optimize all of this. We need more connected flows going forward. Then we’ll be able to leverage AI. The obvious one is when I’m doing DFT and, ‘Oops, I can’t achieve fault coverage.’ So now I manually need to go back to earlier in the tool flow. All these loops need to be closed. But where are you going to find the bodies to do that?”
According to EDA companies and some of the leading-edge foundries and OSATs, the answer lies in news tools, methodologies, and possibly more restrictive design rules and more limited packaging options. But it’s too early to tell how all this will ultimately shake out. Change is happening much faster than anyone could have predicted several years ago, and the proof is in the data.
Reference
1. Siemens EDA and Wilson Research Group, 2024 Functional Verification Study.
Related Stories
Leave a Reply