Chip Design Digs Deeper Into AI

Collaborations are going wider and deeper with multi-chiplet designs.

popularity

Growing demand for blazing fast and extremely dense multi-chiplet systems are pushing chip design deeper into AI, which increasingly is viewed as the best solution for sifting through scores of possible configurations, constraints, and variables in the least amount of time.

This shift has broad implications for the future of chip design. In the past, collaborations typically involved the chip design team, a fab, and an EDA vendor. In the future, those collaborations may be expanded to include multiple EDA vendors and chiplet developers, verification experts, one or more fabs/foundries (with test, inspection, and process experts), and an OSAT. In addition, data will need to be shared across an extended ecosystem at different times, stretching from initial concept all the way into the field, and ultimately breaking down silos, raising questions about data security, and prompting changes in flows and methodologies.

“It’s not just die-to-die design, package, PCB, thermal, power, stress separately,” said Sutirtha Kabir, R&D director at Synopsys. “All of these specialties are coming together now. If your universe is now becoming a mini-PCB, a mini-system, how do you give designers a sandbox where everybody can focus on their part, but still be looking at the same thing? They’re not looking at variants of the same thing, because that will lead to data inconsistency very soon. That’s the challenge already today as you’re going from tool to tool, flow to flow. That’s an inevitability. You cannot say, ‘I’m going to use all Vendor A flows or all Vendor B flows.’ That’s not going to work. The daily ecosystem has to be such that different tools and different flows have to play together. And within that, how do you make sure that the data representation is unique and everybody’s looking at something just the right way they should?”

Enabling AI, both inside EDA tools and in the systems being developed, requires some fundamental changes in the design process. “One of the keys to making all of this work is that it has to be a vendor-agnostic ecosystem, but not all EDA players want to play with other EDA players,” said Chris Mueth, new opportunities business manager at Keysight. “Some wall off their universe, and that’s probably not a good thing because most of the customers say, ‘What we do is different than our competitors. How we design is different. I use different mixes of tools, I use different processes, I need to keep this thing flexible. And therefore, I need to have an open ecosystem.’ That is a definite requirement. And this is something we’re working at.”

What’s different
So far, most of the “smart” tools use machine learning, which is a limited subset of AI. But as the industry begins utilizing generative AI, that will change significantly.

“Traditional EDA is all about physical intelligence, where we know everything about that system and we calculate the next answer,” said Rob Knoth, group director for strategy and new ventures at Cadence. “AI is the opposite. AI is this black box approach where, if you have a high volume of inputs and outputs, you’re going to be able to train and infer the new answer. While those are opposite methodologies, they can help each other. You can start using that physical intelligence to create mountains of data to train an AI system to do scientific calculations, scientific reasoning, and you can use AI to speed up your physical intelligence. We’ve got custom software running on custom hardware to do something we could never do before. It’s building on that legacy of managing complexity with orders of abstraction, dealing with small rules to give you much better management of that complexity — and orders of scale, coupled with modern AI and EDA technology, that’s allowing us to get to where we’re at now.”

The direction is well defined, even if all the pieces are not in place yet. EDA vendors have been investing heavily in various types of AI for the past several years.

“Just like multi-core designs, cache coherency, the power wall, the memory wall, and AI inflections, we now also have generative AI and multi-die SoCs driving the latest 10X complexity surge,” said Michal Siwinski, chief marketing officer at Arteris. “Managing design complexity comes down to a lot more logic connections and a growing number of different types of physical constraints, which must all be computed together to find the most viable scenarios that meet requirements — or better yet, something closer to optimal — all without blowing up the project schedule or the costs. The answer to such a challenge is reuse and smart automation, particularly leveraging the different flavors of AI or machine learning for much faster exploration of the design space and efficient optimization. This is already here, with growing adoption of this paradigm shift in how chips are designed.”

Exactly how and where AI is applied is still a work in progress, but there is little doubt that it will reshape at least some segments of chip design.

“How can AI/ML help manage all this complexity? A good place to start is by asking what we mean by more complexity,” said WeiLii Tan, director of product management in Siemens EDA’s Custom IC Verification Division. “What do they mean when engineering teams tell us their job is becoming a lot more complex recently? There are a couple of reasons why it has more complexity. Larger designs, larger circuits, of course. We always say this new process technology adds a lot of complexity into how we do things, and that’s been true since the 65nm node. Every new advanced node introduces new considerations, new ways of variation happening, and manufacturing complexity. All of that comes into play in the design and verification. There are more variables and more parameters that we have to take care of now. More recently, we’ve seen an explosion in the operating conditions that the circuits are subject to, and semiconductor designs are used in places where we didn’t go before, like robotics. Semiconductors are also mainstream in cars now. In short, there’s much more data, a lot of different things, a lot of different nodes. That’s what makes things more complex, driving people to think about how to deal with this.”

AI is especially useful for sorting through massive amounts of data and identifying anything that falls outside the bounds of a pre-defined acceptable distribution.

“A few years ago, the discussion was about if and how supervised or unsupervised learning could be applied,” Siwinski said. “We’re now looking at deployment of products that incorporate a mixture of AI automation approaches, with anything from reinforcement learning to leveraging generative AI with transformers, variational auto-encoders, and generative adversarial networks to get to the right answer faster. In some instances, these approaches provide effective optimization and refinement compute to meet the PPA requirements or coverage goals. In others, these are increasingly helping with efficient design and partitioning. In all cases, the increasing level of AI automation is having a direct impact on productivity and schedule constraints, which otherwise becomes prohibitive, given the sheer size of design innovation and complexity that every semiconductor engineer must deal with.”

Moreover, while much of the buzz around AI involves advanced chips developed for data centers, much of what is being done for those applications applies to edge devices, as well. “It impacts what is considered middle- and even what was traditionally low-end complexity, per the demands of the increased use of sensors and inference in smart edge applications,” Siwinski said. “This revolution, which is really a bit more of an evolution, looks quite promising. Yet the challenges that the underlying semiconductor design complexity poses are no less daunting, particularly when looking at the impact of dual forces of both PCB multi-chip aggregation into system-on-chip and chip disaggregation into multiple die.”

Experimentation is critical
As more IP is hardened into chiplets, and as more devices are assembled from those chiplets in unique ways into advanced packages, it will become essential to make tradeoffs around performance, power, and area/cost. The promise of AI is to be able to do that faster by leveraging a set of standardized characteristics, such as heat, susceptibility to noise, and throughput, and to swap different chiplets in and out at the architectural level in what is essentially a virtual sandbox.

“A sandbox like this is going to be extremely critical for designers,” said Synopsys’ Kabir. “In EDA tools, you get design continuity, such that once you have done your floorplan exploration, you just hand that over instead of bringing in a napkin floorplan from some visualization tool. And now have a starting point for your implementation team.”

The goal is to do all of this faster, with fewer errors. That includes many test tasks or utilities where the EDA tools have massively exploited automation, such as linting, autochecks, X-checking, CDC, reset and clock analysis, and static timing. Patterns in those automated tasks can be leveraged to improve outcomes.

“These patterns can be quickly learned from different runs on different designs using various customers. The variability in design types and customer types provides a rich suite of models which bolster the AI model’s capabilities to make sharper predictions.” said Ashish Darbari, CEO of Axiomise. “Typically, AI has been used by EDA to leverage regression information to make the subsequent runs faster and more efficient. In cases of UVM and Portable Stimulus, there has been a significant increase in employing AI-based learning for intelligent sequence generation to mitigate the stimulus challenge with UVM.”

However, Darbari questioned whether there is sufficient data to apply AI to different design types unless this is done in-house by big chip companies, leveraging data from their own internal projects.

Others points to similar concerns. Synopsys, for instance, has taken a narrow approach to solve specific problems in the multi-die space with AI/ML, applying the advanced technologies where they make the most sense.

“It’s not that you go and optimize the whole design, like you would do with a PPA push,” said Kabir. “The approach is different here. You’re trying to solve a particular problem and you’re trying to show a what-if and trend analysis, but also the traditional notion that you need to have a lot of designs in order to train your ML model. That’s not the approach we’re taking. ML has the ability to learn from run-to-run off the same problem. Let’s say I gave you the space of 1 million. What we’re seeing with our experiments is that AI is going to run, let’s say, 300 points, and then you run 50 of them in a batch. From batch-to-batch, ML can still learn and help you find the better design because ultimately you’re not going to run a million points. You want to run fewer points, but still get the trend. That’s where both AI and ML can really help you. This involves your techonomics. If you choose the wrong thickness for your insulation material, if you choose more metal layers in the package, all of these can impact your cost by a large amount, and these decisions are made much earlier in the design cycle. If you’re missing the mark there, your project may not even fly.”


Fig. 1: Current chip design methodologies have many iterations and are time-consuming. Source: Synopsys

One way to speed up this whole process is to reduce the amount of data for specific tasks, an approach that harkens back to high-level synthesis. “If you’re looking at data analytics, maybe you’re interested in one particular thing,” said Keysight’s Mueth. “How do you get to that one particular thing when you have this big glob of data to comb through? A lot of customers are struggling with this, and they have to do that manually. It’s one of the biggest problems people have with workflows today. The metadata helps in a lot of ways. If the data is stored in the proper format, you can access it through databases, and you can query and get down to the data that you want to that way. But another useful way of getting at that data is, because it’s tagged, if you set up your data analytics to key off those tags, then you can reduce the data very quickly. Maybe I have a bunch of data that shows results against requirements. I need a digital thread that links that data to the requirement, to the process that was used to get the data, my version controls — the collection of all the different process elements that produce that data. If I see something that failed, maybe I can do data reduction by keying off all the fails. Then I want to key off the fails in the area that I’m interested in, reducing the data further.”

That subset of data also can be utilized by other analytics tools to zero in on certain tasks. “If I don’t have metadata, machine learning is kind of dead,” Mueth said. “I’m not going to get much out of it. And then the data has to be in a common format. One of the issues we have today is if you look at one of these big SoC designs, there may be 20 to 30 different software tools used in the design phase. In the test phase, you may have about two dozen software tools to use to develop the tests and comb through the data, and it’s highly likely that many of these tools will be putting out data formats that don’t talk to other data formats. So as an EDA vendor, if you’re trying to solve this problem, that’s one of the things to look at. Do I build interfaces from Tool A to Tool B, B to C, C to D, D to E, E to A, etc.? If I do that, it’s an exponential problem. To solve that, I do a hub-and-spoke model, meaning I have a common data model, and I have spokes that are adapters to the data format. With that kind of architecture, I am doing one adapter per standard versus an exponential number of adapters or translators.”

AI/ML already provides big time savings for some tasks. The challenge now is to expand what it can do, and to go deeper to improve the results of those efforts. “With all the new data coming in, new requirements for compute speed, new requirements for engineering time, AI/ML tools also can help by providing a way to get correct and verifiably correct answers for the engineering team, Siemens’ Tan said. “This is a departure from some spaces of consumer usage where they say, ‘Well, if it’s not correct, I’ll get a good laugh about it and maybe post it up on a meme and say it’s okay, I know it’s supposed to be this.’ But when millions or billions of dollars are hinging upon this answer to be correct, then it takes on a different meaning. It needs to be correct every time, and it needs to be verifiable. I need to be able to convince myself that it’s correct, and the tool needs to be able to convince itself that this answer is correct. So it’s about correct and verifiably correct answers. We also need to be able to throw all kinds of corner cases at a tool and the tool shouldn’t choke. It needs to be robust enough to handle that, and it needs to be usable. The worst thing that an AI solution could be is one that adds a layer of complexity to your EDA flow, because our main purpose is to manage complexity, so it needs to be usable to the engineering teams. I’ve been part of engineering teams before. We’re not necessarily statistical model experts. We’re not necessarily deep learning experts. But we are IC design experts, IC verification experts, so it needs to be applicable to our knowledge stack and core competency.”

Conclusion
The immediate and obvious challenges of where AI is deployed today, and where it will be deployed in the future, center around the physical realm with computation and optimization across countless corner cases, Arteris’ Siwinski said. “This is made that much more difficult when dealing with dies from various sources, nodes, foundries, etc., all while dealing with an evolution of multi-die standards, such as the emerging UCIe.”

No less important is how to deal with the resulting explosion of logic that multi-die systems will open up. A disaggregated system is no longer bound by the reticle limit, which dictates the maximum chip size for fabrication. “Will the resulting chiplets become what today’s on-chip sub-systems already are with a surge of logic that will need to be designed and implemented on each of those dies, clearly relying on AI/ML techniques and compute to meet specs, schedule, and costs? Time time will tell,” said Siwinski.

Still, a lot can still be done with what’s available today, and a lot more will be done in the future. “There are more analysis techniques that are available, things that are done in standalone tools,” said Kabir. “We don’t have all of those imported. Nobody has them in the place-and-route tools. It’s a process. The point is that the need is acute, and customers everywhere are just confirming that. Complexity is only going up. One of our lead customers taped out 5 million bumps for the whole system last year, and they’re already looking at 10 million-plus, going to 50 million bumps in the next two to four years. The foundries are projecting trillions of transistors in a system, but for that to happen several things need to come together. Standardization, for one. Standardization is a very lofty goal. And you must have the ability to model the right amount of data. For example, when you’re looking at thermal hotspot effects, you don’t need all the details up to the standard cell level to analyze that die. You somehow need to be able to represent enough of that die so that your analysis is accurate enough for what you’re trying to do. But at the same time, you’re not overloading the whole system and waiting for a week, like you would do for sign-off. So data modeling for the right task is extremely important.”

The industry is just beginning down this path of how to use AI more effectively. Progress is being made, although not necessarily commensurate with the hype because it depends heavily on collaboration, data, and a willingness of chip designers to learn new approaches.

Related Reading
Multi-Die Design Pushes Complexity To The Max
Continued scaling using advanced packaging will require changes across the entire semiconductor ecosystem.
Can Models Created With AI Be Trusted?
Evaluating the true cost and benefit of AI can be difficult, especially within the semiconductor industry.



Leave a Reply


(Note: This name will be displayed publicly)