Complexity, uncertainty, and lots of moving pieces will challenge the semiconductor industry for years to come.
AI’s ability to mine patterns across massive quantities of data is causing fundamental changes in how chips are used, how they are designed, and how they are packaged and built.
These shifts are especially apparent in high-performance AI architectures being used inside of large data centers, where chiplets are being deployed to process, move, and store massive amounts of data. But they also have begun to impact other types of chips, as well, as the EDA tools and flows used to design and verify those multi-die systems continue to evolve. Traditional silos that for decades added efficiency and predictability in semiconductor design are breaking down, prompting the entire industry to start rethinking how design teams are organized, how they interact with other teams both within and outside of their organization, and how AI can be used to improve the design of AI chips.
“AI is going to reinvent EDA,” said Ravi Subramanian, chief product management officer at Synopsys. “It’s going to reinvent what’s possible in computing. And it’s going to touch every aspect of how you specify, verify, and manufacture a chip. What used to be a single domain of electrical performance is now thermal performance, mechanical stress, and much more. The number of domains you have to simultaneously analyze is driving a completely new way of how to design a chip.”
Others agree. “We recently reorganized, or re-energized, a kind of cross-company AI team,” said Matt Graham, senior group director for verification software product management at Cadence. “We still need fundamental engines, and engineers need to understand the requirements for all of this. But we also need these over-arching engineering teams. Previously, those may have been marketing teams and product engineering teams — go-to-market types of teams, where if we use them together in a certain way we can solve a problem like low power mixing. But more and more we’re seeing that’s actually an engineering problem, not just a go-to-market solution. We may need specific features built into the tools or a specific flow stitched together at the coding level, not just at the scripting level, to enable these different solutions. It’s not a fully unified single flow, but it flows from one to the next to the next.”
One big challenge how to integrate various implementations of AI, which effectively could provide a bridge between the data collected at the start of the design process and what’s showing up both pre- and post-silicon.
“More and more, our AE teams and our product engineering teams are starting to build that kind of cross-functional knowledge,” Graham said. “Our customers are looking for those kinds of people and building those kinds of teams, too. Verification engineers are great at doing UVM, SystemVerilog, and running a diversity of debug tools to get to the root cause of logic bugs that you find during simulation. But they’re also building teams that support engineers across different verticals so there is a clear and well-defined path, from full custom analog all the way to pre-silicon testing.”
That requires highly sophisticated AI models, and today at least, there are tradeoffs to be made here. For example, predicting how different components of a chip will work together needs to be balanced with ensuring that control loops are tight enough to maintain reliability when multiple models interact with each other.
“Modeling is fundamental,” said Synopsys’ Subramanian. “What is the thermal model of the chip when the workload is running on it? Can you create that model and use it to do analysis of how that affects mechanical stress over time? We need a model for how the stress is going to behave, and translate the stress into an electrical effect. It even needs to include fluidics if you have liquid-cooled systems. So each of these models are in different domains, and then you need to have models working together. And with that, a design team dramatically changes. The design team now is working with a modeling team that’s looking at packaging technology alternatives. What’s above the substrate, interposer, CoWoS, etc., and how do you create the models needed so the design team can do the analysis? The design team is no longer just writing RTL and putting things together. They’re doing the functional and the electrical, but now they’re also bringing these models together to start addressing new requirements.”
AI-driven flows and tools
Predicting where AI will be used and how it will impact existing tools, flows, and methodologies is still being sorted out. After decades of intermittent progress, AI is rapidly becoming useful for a variety of applications.
“AI is all over the place,” said Jean-Marie Brunet, vice president and general manager for hardware-assisted verification at Siemens EDA. “Most of the devices are AI. There is a lot of software technology that we’re developing around AI, and we have built AI into pretty much all of the technology in our business unit. When you’re running a design with 10 billion gates on an emulator with a complex workload, you’re looking at hundreds of billions of cycles. You have full visibility into big data, but this is too much data to look at. So we’re using AI to efficiently probe what needs to be probed, and efficiently tune what needs to be analyzed. This is a complex challenge in our industry.”
It also changes the overall workflow, particularly when chiplets are added into the mix. “It’s all done in parallel,” said Mick Posner, senior product group director at Cadence. “And with chiplets, a lot of this involves unique designs, which is very similar to what would happen if you put multiple chips on a PCB. There needs to be interaction and standardization in order to communicate between those chips. Chiplets are exactly the same. The individual teams are responsible for the function of that chiplet, but they need to be exchanging information on timing and power, because ultimately it’s all going to come together in a single package. So the traditional challenges are still there. You need a proprietary interface that’s well documented, or a standard like UCIe. But then, at some point, those die need to come together, and that’s where the EDA tools are evolving so that each team can fundamentally treat the other teams’ dies as a black box or white box. They do their analysis of signal integrity, power integrity, and thermal analysis before the dies come together to ensure they are actually going to work together.”
What’s driving the changes
Much of this is the result a sudden and, for the most part, unexpected confluence of factors. The rollout of ChatGPT in late 2022 and the surge of interest in generative AI (and subsequently, agentic AI) has set off a massive investment in blazing fast chips and AI data centers. Two years ago, most of the chip industry had never even heard about generative AI.
The bigger concern, particularly at the leading edge, was that device scaling using a single planar die was becoming untenable due to the inability to scale SRAM and wires, and the size limitations of a reticle. As a result, large chipmakers and systems companies began focusing on multi-die assemblies in advanced packages, which offered significant yield improvements compared to a giant, monolithic SoC, and potentially greater reuse of at least some of the chiplets.
That work continues today. But putting those chiplets together is harder than it looks in PowerPoint. Designing advanced chips was difficult enough with a single die, when more and larger simulations and prototyping had to be handled earlier in the flow. With chiplets, there are more pieces, more potential interactions, and more interconnects. And the package, which used to be relatively simple, is now a key element in the functionality and behavior of a chip.
“If chiplets are connected to each other through die-to-die PHYs, and you treat each PHY like a conventional PHY — which has the physical layer and some adaptive layer — then you can treat them as isolated designs where you do all the place-and-route, timing closure, and so forth,” said Ramin Farjadrad, CEO and co-founder of Eliyan. “Then you hand off the data that you need to communicate to the other chiplets to the PHY and adapter, and it’s transmitted in a conventional way. Of course, the PHYs would need much lower latency and much higher bandwidth, and that benefits from the large number of wires. That’s how people have dealt with it so far. But moving forward, especially for 3D applications, people don’t want the overhead of the adapter and these large MUXes and DEMUXes. To avoid that, you have to build a model that includes buffers, which have their own delays and timings, and also build a clock tree around that. This is what the big EDA companies have been trying to work around. The simpler the interconnect becomes, the easier it is to create one big SoC or SiP.”
That requires more up-front planning, though. “All the designs we are doing this year are chiplet-based,” said Letizia Giuliano, vice president of IP product marketing at Alphawave Semi. “What we are seeing is that more of the design cycle is spent at the architecture stage. How do you break it down into pieces? Where do you start? Which package technology are you going to use? Everything starts from the package technology, which determines how we’re going to break this down to achieve the total cost of ownership that customers need. It’s all backward. In the past, the package design was the last piece. It’s a big change from what we’ve seen. If our IP doesn’t work, the package was not designed correctly.”
This becomes even more complex as chipmakers begin stacking die. All of the major foundries have 3D-ICs on their roadmaps, and chip industry sources say many of the large system companies and big processor makers currently are working on 3D-IC designs.
“Over the past year there have been quantum jumps in the level of complexity and density,” said Todd Bermensolo, product marketing manager at Alphawave Semi. “Last year, we were seeing more 2D integration with the package. Now people want 3D because the 2D density is not enough. But when you stack them, you need very elaborate ways of interconnecting them. That’s way more elaborate than 2D packaging, which is itself pretty complicated compared to traditional packaging of just silicon on a substrate.”
Added to all of this is hardware-software compatibility. Determining which cores are best suited for AI is further complicated when the software stacks are factored in. Due to the fast pace at which new frameworks and models are unveiled, having multiple software stacks for each core makes commercialization a challenge.
“You try to find the optimized solution in terms of area, power, and performance,” Sharad Chole, co-founder and chief scientist at Expedera. “Based on that, you can glue together the IPs. Even memory IPs can be built like that. For some IP, functionality is very clearly defined, like using buffers, hash tables, etc. They can be built like that. But as you move toward more complex IPs, the interactions become more software-driven, and that’s where the challenges come in. How do you maintain your software compatibility? At what layer do you need to change the software? When the interactions become complex, that’s when you’re dealing with more programmable IPs, such as DSPs, GPUs, CPUs, NPUs. All of them would, in a certain sense, fall in this category.”
There are other pitfalls to watch out for. “For training processors, there aren’t that many options,” said Steve Roddy, chief marketing officer at Quadric. “They’re all, by definition, general-purpose, so there’s no barrier that creeps up just because people want to train new models. A year from now, they will be slightly different, but they will still run. You’re guaranteed of that. The inference side is where a lot of the tradeoffs come in. For a lot of the architectures for the deployment of inference models, people have chosen very inflexible, fixed-function accelerators of AI, and that’s the trap. If you look at the set of models today and try to build something that accelerates those and makes them low-power and efficient, and then the state-of-the-art model changes in two years, you could be in trouble. You could wind up with a chip that you spent a lot of money developing, and it can’t run the latest thing, and now you’re dead in the water.”
Future concerns
AI is driving up complexity everywhere, directly or indirectly. Whether AI can tame that complexity, or whether it will open the doors to even more complexity, remains to be seen. There are widespread concerns about reliability with AI hallucinations, and silent data errors from hardware incompatibilities and failures. There also are security concerns about an increased number of possible attack vectors in multi-die chips that are used to run the AI algorithms, as well as possible corruption of the data used to train those algorithms. To make matters worse, many AI implementations today are black boxes, with only limited traceability once they are put into service.
Put in perspective, AI today is beset by massive investments with uncertain risk and potentially variable outcomes. Reducing that variability, increasing predictability, and lowering the risk will require the entire semiconductor industry working together. And ironically, one of the most effective tools for making that happen may be AI itself.
— Adam Kovac contributed to this report.
Leave a Reply