After two decades of experimentation, the semiconductor industry is scrambling to embrace this approach.
Machine learning and deep learning are showing a sharp growth trajectory in many industries. Even the semiconductor industry, which generally has resisted this technology, is starting to changing its tune.
Both machine learning (ML) and deep learning (DL) have been successfully used for image recognition in autonomous driving, speech recognition in natural language processing applications, and for multiple uses in the health care industry. The general consensus is that it can be similarly applied to semiconductor design.
This isn’t exactly a new idea, though. The basis for ML and DL in chip design dates back nearly two decades, and the concept of ML/DL dates back another three decades before that.
“We called it ‘metrics’ in 1998-1999,” said Andrew Kahng, a professor of computer science and engineering at UC San Diego. “The main principle was measure everything, data-mine the log files, predict tool sweet spots and failures, and figure out how to tune specific tool options for a specific design instance.”
That made sense on paper, but actual adoption lagged.
“Everyone was allergic to it at the time, although internally TI and Intel all had metrics initiatives,” said Kahng. “Twenty years later this is what, as an industry, we are looking for—basically helping EDA deliver scaling, where scaling is not just quality of results (QoR) but also schedule. If you can predict failure, if you can reduce mis-correlation—which is margin and dollars—there’s enormous value. There is so much low-hanging fruit, it’s really amazing. Machine learning is definitely there for EDA in the present and near-term. With deep learning, it’s harder to say.”
Chris Rowen, CEO of Cognite Ventures, sees two distinct areas where the EDA/IP industry will participate—and while not connected today, they may converge over time. “Number one, deep learning especially represents a really fundamental, new computing concept — it’s statistical computing instead of von Neumann or even Turing computing. In that sense, there is an opportunity both in the IP for supplying various engines that do this, and in fact, also in the tools — just as EDA supplies tools that allow people to build traditional, non-statistical computing systems. It will supply the tools that allow people to train to optimize network opportunities, potentially in the data augmentation/data creation business, as well, because these are computationally hard, technically sophisticated, essential building block algorithms for anybody who is building anything going forward. Therefore, there is a role for the EDA industry, or something that looks a lot like the EDA industry, to participate. But it’s largely something that the EDA industry has to make a decision to reach for. It doesn’t happen automatically.”
The second area involves using these statistical methods in their increasing sophistication to the traditional things EDA tools are expected to do in simulation, analog, power analysis, place-and-route, modeling, and verification, Rowen said. “Ultimately, the biggest opportunity, and arguably the saving grace for EDA, is to move up and participate in the creation of the statistical computing systems, not just do a better job on the traditional, physical implementation, and traditional analog and digital modeling.”
But none of this happens automatically. EDA companies are cautious about investments, which explains why they passed on participating in the software tools world for economic reasons and comfort zone. ML and DL are, in a sense, a step beyond that. “But unlike compilation and some of those things, it actually is a really computationally hard problem, and one where you can imagine that people will say, ‘I am willing to pay a lot to get a 5% better solution,'” Rowen said. “That was a characteristic of why EDA tools became so valuable. In some of these cases that is also the case and so there may be a big market opportunity of the sort that the EDA industry thrives on, but it is a stretch.”
Using ML and DL
ML is a set of algorithms that enable a computing system to recognize patterns from large data sets. Deep learning adds multi-layer artificial neural networks to applications involving large amounts of input data and draws inferences that can be applied to new data. Because chip design generates huge volumes of data, both ML and DL will be useful.
Still, in order to achieve the most benefit from ML and DL, much thought must go into where and how the techniques can be applied.
“ML is absolutely relevant to EDA, but for maximum benefit you have to think of ML within the tools and around the tools, said Sashi Obilisetty, director of R&D at Synopsys. “So within the tools, if you’re doing an ECO, you can optimize a number of signoff power or timing simulations that you have to do. ML can be used for better scheduling in formal tools, for instance, and it is already used in manufacturing. It is used for mask synthesis. There’s no question that it will improve our tools internally. What we also need to think about is how it can be used around EDA tools. This is about automating tasks. How can we find analogies? How can we automatically analyze log files? How can we do intelligent resource allocation? There is a lot of new work and new ideas we can implement to work better with the environment, the foundry, and within the tools.”
This provides some interesting adjacent opportunities for companies, as well. Norman Chang, chief technologist for ANSYS’ semiconductor business unit, said his team began looking at how to use ML/DL for image recognition and natural language processing. But after searching for two years, ANSYS could not identify a startup focused completely on machine learning/deep learning, so it decided to approach this market on its own.
“Inside ANSYS, we have an effort to look at the possible applications using machine learning and deep learning in our domain, which is simulation,” said Chang. “This is a traditional industry that uses a lot of computational techniques. It’s very deterministic, it’s not so much heuristic. When we look at the simulation world, we look from the design input to the simulation. The combination of design input for 7nm down to 5nm is getting larger and larger. The number of vectors for each block is getting longer to cover low frequency to high frequency. So the combination of vectors from the design input is increasing, in addition to multiple switching scenarios and multiple ports. For big CPUs, there can be more than 100 CPU ports inside. How do you determine the switching scenario that can give complete coverage, and good switching conditions that you can catch all the possible problems? From an input selection point of view, there is an opportunity to use machine learning to cut down the selection from the input.”
For the design itself, there are opportunities as well. Chang pointed to the generative adversarial network (GAN) technique that has become very popular in the last year. ANSYS has demonstrated it could be used to automatically come up with many possible design varieties automatically — with the appropriate constraints. He believes the EDA industry also can use ML/DL for input selection and input dimensionality reduction. As for core simulation, surrogate and optimization models can be created to speed up the simulation. Then, after lengthy simulations, neural network models can be built, which can be used by designers to perform interpolation/extrapolation for prediction of other conditions.
Yet another area where ML might be applied is in the correlation of output from various simulation tools.
“There is a lot of data generated,” said Chang. “We want to put together data from different tools, and then provide the experience base of diagnosis for predictive analytics. In this area, we have to talk to individual customers. Each customer understands the implications. They know what they want to go for, so our idea is that we want to provide a platform and we can build on machine learning and deep learning modules that can help customers to realize their applications and can speed up application development. That can also help with the optimization of the whole loop.”
There is a role for ML/DL in verification, as well. On the functional verification side, there are several areas where it can be applied.
“The first is regression reduction,” said Harry Foster, verification chief scientist at Mentor, a Siemens business. “The second is improving coverage, such as using unsupervised learning to identify outliers and explore outliers. Third, although there are coverage metrics that are effective at IP subsystem unit level, they are meaningless at a system level. This whole notion of statistical coverage at that area implies using learning techniques to extract meaningful information there.”
Using the data
However, functional verification has a different starting point than many other ML/DL applications.
“One of the challenges we have is that, unlike many domains, we actually don’t have a lot of data,” Foster said. “For example, I start a new project — I don’t have a lot of data. Even out of emulation. A lot of it can be biased and not always interesting scenarios, so in order to be successful, at least in functional verification, we have to incorporate domain knowledge in with the data. If I have a coherent fabric and I’m trying to determine the efficiency of the caches in the fabric, I need to be smart where I’m probing to determine if I can start extracting meaningful information and relationships of these caches.”
This is a crucial point for successfully applying ML/DL. “The size of the data is really important, and it’s particularly important to tease apart machine learning from deep learning,” said Rowen. “Sometimes the two phrases get lumped together because machine learning is really a superset phrase, but a lot of the things that are now called machine learning have evolved gradually and gracefully and uncontroversially over 25 years, and people routinely use them. It’s what data scientists do. The attempts to scale up in terms of the size of the model to what we now call deep learning really had a much more checkered history until the last five years or so, when we got enough data on some of these problems.”
That doesn’t apply for semiconductor design.
“Everybody looks at the ImageNet benchmark because it’s the thing that AlexNet and GoogleNet and everything evolved from, and that has a training set of on the order of 60 billion pixels, and you get to train a model which is 10 or 50 million parameters,” Rowen explained. “You typically need a training set that is much larger than the size of the model that you need, and the cleverness and intelligence of the model is somewhat proportional to the number of parameters. There is a smooth continuum of mechanisms that are all in the toolkit of the data scientist, which allow you to build bigger and bigger models, and models of different character. But the size of the model, and therefore the perceived intelligence of the model, is very much a function of how big a training set you have. There isn’t an absolute mathematical relationship because you can have better training sets and worse training sets. But unless you’re talking about something where you’ve got at least millions of data points, it’s hard to say you’re going to do deep learning with these multi-layer networks. In lots of the problems we work on, we have a lot of data, but we don’t have millions of vectors.”
Translation: More data works better, and that means chipmakers and tool vendors need to figure out ways to gather more data.
“If in manufacturing we really got good yield statistics by location of every open and short in a large number of wafers, we would be able to then build a model that projects back onto the input geometries, and we could have super design rules,” said Rowen. “That means the design rules that really apply, not the approximation we get from the thousand rules that somebody sort of developed from a few test chips. This relies on getting gigabytes of data out of the manufacturing line to tell us exactly where those defects are, and measuring the exact data points because that creates the label for the mask geometry. But for a number of things that we do, we’re not there. Moreover, some of the tasks we need to do in functional verification are not statistical. They are black or white. We do not want to know with 99.99% certainty that this logic circuit actually does what it’s supposed to do. We may have a lot of statistical opportunities in finding more coverage that helps us tease out the last little bit.”
Conclusion
So what will be the deciding factor in making a business out of machine learning in EDA? How do tool developers today decide what to work on, and invest in?
Kahng believes it will be driven from the customer side. “Machine learning is ultimately going to lead to shorter design cycles, more efficient use of EDA licenses. So, fewer licenses, shorter design cycle times — is this something that an EDA tool developer is really interested in? Where is machine learning in the greatest demand? It’s when you’re at this bleeding edge of heuristics, piled on heuristics start to break and become chaotic in their outcomes. That’s a real opportunity for machine learning. How do you predict whether this extra 0.25% of utilization is going to doom your floorplan or not? Machine learning is excellent in answering such questions. When tools are very chaotic, you have this other path of machine learning where we’re going to massive cloud automation with no humans in the loop—just let AI figure out how to best exercise your flow for your design. That’s better than waiting for the small data problem to resolve.”
From the people who are using the tools, if the tools are too good, the worry is whether that will stifle growth in the tools industry, said Ting Ku, senior director of engineering at NVIDIA. “That’s ridiculous because if you give me a better tool, I can use less engineers, or not-as-smart engineers. I don’t need highly trained engineers anymore. That’s valuable to me. Would I pay more to offset the need for a smaller engineering staff, not as smart of a staff? Not a Ph.D. for everyone, maybe a Master’s degree is sufficient? Of course I will pay more for that. That’s absolutely the truth. Just because I’m simulating a little less, I’m using fewer licenses, it doesn’t mean I’m eroding EDA’s financial benefits.”
Foster agreed. “In test, we learned how to do test compression to be much more efficient on the tester. It turns out we actually didn’t do less, we realized there were other things we can test. So it opens up capacity, so I don’t think we’ll be doing less. We’ll be able to do more.”
Related Stories
The Darker Side Of Machine Learning
Machine learning needs techniques to prevent adversarial use, along with better data protection and management.
Machine Learning Meets IC Design
There are multiple layers in which machine learning can help with the creation of semiconductors, but getting there is not as simple as for other application areas.
The Great Machine Learning Race
Chip industry repositions as technology begins to take shape; no clear winners yet.
Plugging Holes In Machine Learning
Part 2: Short- and long-term solutions to make sure machines behave as expected.
What’s Missing From Machine Learning
Part 1: Teaching a machine how to behave is one thing. Understanding possible flaws after that is quite another.
Building Chips That Can Learn
Machine learning, AI, require more than just power and performance.
What Does An AI Chip Look Like?
As the market for artificial intelligence heats up, so does confusion about how to build these systems.
AI Storm Brewing
The acceleration of artificial intelligence will have big social and business implications.
Good article Ann!