All major vendors now incorporate ML in at least some of their tools, with more ambitious goals for AI in the future.
Machine learning is becoming a competitive prerequisite for the EDA industry. Big chipmakers are endorsing and demanding it, and most EDA companies are deploying it for one or more steps in the design flow, with plans to add much more over time.
In recent weeks, the three largest EDA vendors have made sweeping announcements about incorporating ML into their tools at their respective user events, and the entire chip industry is heading in a similar direction. Machine learning is a natural fit for chip design. It teaches a machine how to perform a specific task based upon pattern recognition, and results are typically very accurate.
This stands in contrast to the broader AI, which despite widespread confusion because these terms are often used interchangeably, allows a machine to operate more independently. Full AI requires much more data for training, and results generally are delivered as probabilities. The more data — and the more relevant that data — the better the accuracy, which is why much of the training for AI in smart phones and automobiles is done in hyperscale data centers.
The amount of data used in chip design is comparatively small, and the problems are extremely complex and often design-specific. So while EDA vendors continue to experiment with AI for a variety of uses, the initial implementation centers around a subset called reinforcement learning. That, along with supervised and unsupervised learning, are the three primary machine learning approaches. Reinforcement learning essentially rewards behavior on a sliding scale, with the highest reward for optimal behavior, and it dynamically adjusts that behavior based on continuous feedback.
“Think about a chip’s physical design and the tests you do on that,” said Joseph Sawicki, executive vice president for IC EDA at Siemens Digital Industries Software. “There are literally billions and billions of patterns, and you can correlate those patterns to where they lie upon certain net segments and whatnot. Then you get test data that tells you, ‘This design failed.’ I run some extra vectors and get a bunch of different fail logs, all of which tell me that in certain cones of logic I have a failure. So now I’ve got a massive circuit design and these cones of logic, and I can take a look at a neural network that can be trained to show that, given this physical design, it looks like that cone of logic is bad and my failures are likely to be in this area. Then I can do correlation of that with the physical patterns, and get a failure. In one case, we told a customer, ‘Your Via 5 tungsten deposition that’s happening in this one fab on this one line is what’s bad. And sure enough, it was out of spec.”
This allows for much broader experimentation in a shorter amount of time, as well. “The way it’s been done in the past is that typically users run different combinations and they get to a certain PPA,” said Sassine Ghazi, president and chief operating officer at Synopsys. “Now, the system is looking at all that space of optimization and it’s launching multiple jobs intelligently. Sometimes it launches the job, and at 10% it kills it because it knows it’s not going to lead anywhere better. Then it backs up and launches a different branch, and it says, ‘This is the best result I can provide.'”
Fig. 1: Reinforcement learning model. Source: Synopsys
Put in perspective, the complexity in design far exceeds the capabilities of the human brain to sort through all of the possible combinations and interactions in a reasonable amount of time.
“We use AI/ML for some of our device modeling and characterization,” said Niels Faché, vice president and general manager of PathWave Software Solutions at Keysight. “Anytime you look at optimizing designs and you run a lot of simulations, it can make that a much more efficient process. Think about a 6G system, for example, where you have to configure different channels. The standard approach is to simulate a lot of different conditions, and it can take a long time to find the optimal result. We can considerably speed up that process. So with 6G, there will be a very big play for it, and that’s something we are working on.”
Likewise, Ansys is utilizing it for applications such as creating power grids. “That’s done really early in the floor-planning stage,” said Marc Swinnen, director of product marketing at Ansys. “There are some basic questions, like how many straps are there going to be, what sort of pitch, and how wide are these straps going to be? How many vias do I put on the corners? How many rings? These are basic architectural questions, and there are many, many possible combinations. What layers are we going to use? We have an application where you can characterize this across a half-dozen parameters and figure out the optimal pitch and optimal via density to give you the best power. Thermal also comes in earlier than we first expected. When we started working on electro-thermal we were looking at sign-off, but it very quickly became apparent that people were most concerned initially at the prototyping stage. They wanted quick estimates of how they could assemble these multi-die systems, or even floor-planning of blogs, because if you floor-plan it wrong in the beginning and you only characterize thermal at the end and you place two very hot things close together, that’s a disaster. It’s very difficult to recover from that, so it has to happen earlier in the flow.”
Warm start vs. cold start
Reinforcement learning, in particular, can significantly help design teams achieve better results faster, both for a new design and for derivatives of that design.
“One of the key advantages of this kind of new reinforcement learning is to apply automation that the EDA industry has not applied before,” said Anirudh Devgan, president and CEO of Cadence. “One of our big customers told me that they think place-and-route is the most sophisticated commercial software ever written. And there’s a lot of complications in a lot of these programs, whether it’s verification or place-and-route or analog. But what EDA never did optimize multiple runs. So you do one run, you get input, you get output. And when you go to the next run, there is no knowledge transfer.”
Knowledge transfer is one of the big benefits of reinforcement learning. It basically collects data and stores it in a repository.
“The knowledge that something has worked in the past is in multiple people’s heads,” said Thomas Andersen, vice president for AI and machine learning at Synopsys. “What happens when these people move to a different company? Or what happens when they’re in different locations and they don’t even talk to each other? You might find someone at the water cooler who says, ‘Hey, you should try this because it gives you better power. The beauty of this system is that it learns the behavior of the designs and creates statistical models. So the next time I do an evolution of the design — which may be different, but also a lot of reuse — I can start with all this knowledge.”
This is what’s termed a “warm start,” and with increasingly complex designs and shrinking market windows, it can improve time to market without having to redesign everything.
“With a cold start, you’re learning from scratch,” said Cadence’s Devgan. “A warm start can be done within a company, and it works for some common blocks. It’s the next model for reuse. For common IP, you can work with the foundry on a pre-trained model. All the data is there in the company. You mine your own data.”
The approach builds on data that can be collected through many design starts. “EDA is the perfect target for using pattern-matching machine learning,” said Steve Roddy, CMO at Quadric. “You’ve got min-cut algorithms on steroids. You’ve got billions of things to place, and you have to try to minimize wires crossing boundaries. There have been successive iterations of algorithms that power different EDA tools, and they’re all using some complex heuristics to figure out, ‘If I’m going to place all these things, give me the shortest average wire length, the minimum number of wires and crossings.’ That was easy when we had two or three layers of metal. Now there are 14 layers of metal, and 82 masks at 3nm. But you have a history of thousands of place-and-route jobs that have been run through your tool chain. What better training database for pattern matching and building some sort of machine learning? So for the next set of RTL that you ingest, you can say, ‘I recognize that.’ And that’s all machine learning really is.”
More data, more capability
Full AI is further out, and much more difficult to get right. EDA vendors are starting to utilize it wherever there is sufficient data and where the process is so time-consuming that it warrants the effort, but so far its use is limited.
“If you go train on 3 billion images, you can learn what a goat looks like and what a house looks like,” said Jeff Dyck, senior director of engineering at Siemens Digital Industries Software. “You need a ton of data to do that. What we’re doing here is different. The reason we’re running simulation on something is because we don’t actually know the answer to it. We don’t know what it looks like. We’ve never seen it before. It’s a new design, or maybe an evolved design on a new process version or something like that. And so the way that we handle it is we just start over. We don’t take old design data and hope that the next thing behaves similarly, because in the real world of chip design, it doesn’t. So training on big datasets for what we do didn’t help that much. We do use some historic data, and there are pockets where it makes sense, like if you’re doing another iteration on the same design. But if you’re training on a whole bunch of stuff, and hope that the next thing that’s never seen before is going to look like that, it’s not that useful.”
It’s not just about more data. It’s also about good data. “My projects, at the end, realized they got too much data, and sometimes they got the wrong data,” said Monica Farkash, principal member of the technical staff at AMD, during a recent panel discussion. “Most of the time we would just fetch the data, do the analysis, and realize that it is redundant. It was useless data. There’s noise, there are so many other things that are just dumped in there indiscriminately. We ended up sometimes using 20% of the data, sometimes even less than that, and not exactly the shape and form that was required.”
This is part of the reason full AI use is much more limited today than machine learning. Vendors have been highly selective about where they apply AI, which if it works, would represent a massive shift in chip design.
“The real value of AI is in optimization,” said Cadence’s Devgan. “Pattern matching, which was the early days of AI, is useful, and we do geometry processing and pattern matching. But the real value is decision-making optimization. What is the best framework? We’ve had all kinds of optimization methods, which were gradient-based. But what AI allows is a non-gradient-based optimization, which reinforcement learning allows, so we can apply optimization to things that we never applied it to.”
Whether full AI or machine learning, there is potential for a much more fundamental shift in the industry. In the past, the most valuable data about what works in a design and what doesn’t came from the fab. From there, it trickled back to design teams and EDA companies, which routinely complained they never received enough data. That situation has improved over the years, but it’s still lopsided. With AI/ML, data needs to flow in two directions, and it potentially can be combined in ways that were previously considered impractical.
“Within physical design, one of the things we work on involves defect information you get off the fab floor,” said Sawicki. “And now I want to tie that back to understand how my image patterns are tying into that to learn more about what other patterns might go bad. That is being done on the same tool infrastructure as other parts of Calibre, for example, that are being used to do machine learning, OPC, and like. So you can put in place a common infrastructure around a number of different applications. It doesn’t necessarily mean that you have this one AI engine that works for Calibre and functional verification. These are very different problems. But there is a lot of interaction between where people will learn.”
Conclusion
It’s important to distinguish between machine learning, which is a subset of AI, and the larger AI, which is much more difficult to apply in EDA. Frequently, the terms are used interchangeably, even in the names of tools. But real AI is only slowly being applied to semiconductor design because it takes time to figure out if the results are accurate enough, consistent, and scalable.
Semiconductor design is becoming much more complex, and as it moves into advanced packaging, it’s also becoming much more domain-specific and heterogeneous. Finding patterns that span multiple portions of a design, and then determining which are applicable to other designs or even derivatives, is a massive challenge for EDA companies. But it also requires more expertise than the chip industry has today to realize the potential.
“In order to apply artificial intelligence to learning solutions, you need people who know what they’re doing,” said AMD’s Farkash. “We have to educate them, wait until they come out of school, and help us where we need to go. But on the other hand, we have to fear people who think they know what they’re doing. So this will be the biggest challenge. Opportunity is easy. I have worked on two dozen different topics in which I applied machine learning processes, and wherever I look I see an opportunity — mostly for design exploration, verification, and simulation. It’s everywhere.”
Leave a Reply