Successful utilization of machine learning within EDA cannot happen without confidence in the quality of results. That presents challenges.
Is EDA a suitable space for utilizing machine learning (ML)? The answer depends on a number of factors, including where exactly it is being applied, how much support there is from the industry, and whether there are demonstrable advantages.
Exactly where ML will play a role has yet to be decided. Replacing existing heuristics with machine learning, for example, would require an industry-wide effort to overcome a long list of challenges, which is unlikely to happen. But there are other opportunities where ML is likely to be more successful. Tools within the EDA space, and the heuristics embedded in them, are guided by the faithfulness of the abstraction utilized for a certain purpose in the design flow.
For example, a gate level abstraction provides a highly accurate and precise abstraction for functionality. Prior to 90nm or so, it used to provide an acceptable abstraction for timing. At that point, the abstraction failed and often led to a breakdown in fidelity, which made it difficult to reach timing closure. The abstraction had to be refined to include the impact of wires on delay, which meant the functional and physical aspects of an implementation had to be combined. That, in turn, added considerable complexity to the algorithms.
Over time, an increasing number of physical attributes have been required. The introduction of finFETs significantly increased the complexity of many algorithms, and each new technology sees an increasing array of intertwined factors. The abstractions being used keep getting refined. If the types of decision being made are based on an insufficient accuracy or precision, they become arbitrary at best.
This challenge is not unique to physical implementation and verification. Similar challenges occur in the system space, where increasing concurrency and complexity add to comprehension challenges. For example, an instruction set simulator may operate at several levels of timing accuracy — instruction-accurate, timing-approximate, or cycle-accurate. Behavior can be dependent on ordering, and that can change based upon small shifts in timing. If that is important to the decisions being made, then the right level of accuracy has to be chosen, even if it impacts the performance of the tools.
Knowing these types of limitation is based on knowledge that is gained over time. New tools and new abstractions are rarely trusted when they first reach the market. Over time, confidence is built, and tools improve to take into account additional information that comes to light.
Perception vs. reality
Many people are considering the suitability of ML within the EDA space. While often showing good accuracy in the numbers provided, ML frequently has surprisingly low fidelity. It sometimes only takes a very small change in input to yield a very large change in the output, and nobody can explain why. This means you either need more data to improve the accuracy, or you need to restrict the areas in which it is allowed to make unsupervised decisions.
There also is a perception issue involving ML that started with robots and has spread into hardware design.
“If you look at some examples such as an autonomous car, you can argue that full self-driving could kill you,” said Thomas Andersen, vice president of the AI and Machine Learning Solutions Group at Synopsys. “There definitely are scenarios where somebody could have been killed had they not intervened. In the case of self-driving cars, you are automating a task that today is pretty much completely human. The human makes mistakes, but a machine is not generally trusted. Let’s say an accident happens. It will be magnified, and everybody will point out that the technology is not ready — even if it was statistically much better than the human. Humans make mistakes, too. In the case of self-driving cars, there is no room for error. In chip design you can argue the same. If I make a chip and I don’t meet my hold timing, the whole chip will not work because of that mistake.”
How does that apply to EDA?
“Machine learning can be used for two separate application types within EDA,” explains Andersen. “The first is replacing existing heuristics. Consider implementation and sign-off verification tools. In the implementation space you have a placement algorithm, and for sign-off you have a static analysis algorithm. The verification algorithm in the physical space ensures that you honor certain rules. Now, you can replace some of these heuristics with machine learning-based predictors, and for them to work as well or better, you will require enough training data and you will require the right training data, otherwise it will not work.”
All of this is a work in progress. “It is an illusion to think that the algorithms in use today are always correct,” Andersen says. “Heuristics means they take some shortcuts. All these problems are NP complete problems. None of the algorithms are completely correct. They make mistakes. They have just been tuned for enough data points to give an assurance that they work well enough. How do you verify that they’re actually correct? You verify this based on data. If I introduce ML-based algorithms to replace the current heuristics, I simply have to make sure that I have enough relevant data, such that I feel justified that the output that I’m getting is correct.”
These are predictive algorithms and need to be trained on a lot of data. Every EDA vendor has implemented some of those algorithms for predicting timing or routing DRCs. They have been somewhat successful at it, too, but not as successful as some people had expected. This is largely due to the data on which those algorithms were trained. That data may not be good enough quality, or there may not be enough data, or it might be a combination of both.
A different type of machine learning is used for playing games, or teaching a system to play games, which is called reinforcement learning. Reinforcement learning is not a predictive methodology. You don’t train it on millions of data points and then try to predict something. It works more like you essentially try to understand a complex system by poking it and see how it responds, and then you learn from that response.
This is the second application for ML and it involves automation of tasks now done by humans, such as operating EDA tools. They run something, provide certain inputs, and they tune something to hit certain QR goals.
“Reinforcement learning can, similar to a human, try many different experiments and learn how this particular design on this tool behaves based on those inputs, and draw conclusions,” adds Andersen. “Then it can give me an answer by searching a large search space. In our experiments, we have found that it always outperforms the human, simply because a human cannot do 100 runs in parallel and draw the exact conclusions about how the system responded to a particular input.”
There are clear advantages to using ML. “The goal of using ML within an EDA flow is not about having the ability to produce a better result than your most experienced engineering guru with unlimited time,” said Dave Pursley, business development director in the Digital & Signoff Group at Cadence. “Instead, it is to help your engineering team meet and exceed aggressive power, performance and area (PPA) goals under the constraint of an aggressive schedule. The goal is to make engineers more productive by raising the level of abstraction. For high-level synthesis (HLS), the level of abstraction is in terms of design input, writing untimed SystemC/C++ and allowing the HLS tool to create cycle-accurate RTL. For ML with an EDA flow, the level of abstraction is in the use model of the EDA tools, specifying the higher-level design goals and letting the tools micro-manage when and where to turn all the knobs and switches.”
Others agree. “Machine learning will never be able to do the work instead of us,” said Darko Tomusilovic, verification director for Vtool. “It will only be able to help us. We should strive to find ways that machine learning can improve our efficiency and make us more productive. We should not expect ML to do the work instead of us, because then we will end up being very disappointed.”
ML also requires something of a mindset change. Rather than fixed numbers, ML reports results as distributions and probabilities. Defining what is sufficient accuracy can vary by application, but it also can vary depending on whether the ML uses predictive or reinforcement modeling.
It’s all about data
Where there is sufficient high quality data, ML can help considerably.
“ML can be used to find anomalies that humans would never find,” said Simon Davidmann, CEO for Imperas Software. “For example, a fault in one processor in a server farm led a company to be overcharging their clients. ML was used to monitor their business and highlight erroneous calculations that were happening. It was able to pinpoint the fault. We are not that different from many companies where we have a large number of tests and regressions are run after changes are committed and we record the results. You get to see which tests are fragile and break often, which bits of code cause certain tests to break. These are home-built analytics. But they enable you to improve your processes. I am intrigued that people are using ML to improve the quality of the tools. ML can show you that something’s not as it was. It is more than just watching a curve, they effectively recognize things.”
How much is enough data isn’t always clear. “Verification data sources are almost inexhaustible,” says Vtool’s Tomusilovic. “Data comes in many different forms including log files, waveforms, coverage data, and the code base itself. This data must be understood as a whole. We need to find the minimum amount of data that is useful for an ML tool. We must not overload the situation with a lot of garbage data which will make it inefficient.”
Others have a similar view. “I could take hours or days of emulation data,” says Harry Foster, chief scientist for Siemens EDA. “Yet it it’s not necessarily the quantity, it’s the quality of the data in the sense of how uniquely it is exploring the state space. I can have tons of useless data, and there are many people who claim we have a lot of data, but we don’t when compared to many other industries — even for physical aspects of the design. You take one project and extract data, but it might not be applicable to some other project. We need solutions that are basically abstracting the data, or which allow us to do data mining analysis with small data sets.”
At the same time, companies are becoming more reluctant to share good data.
“It is more difficult now than it used to be years ago because our customers are protecting their IP more,” said Andersen. “We generally have less data access than we used to have for these test cases. In addition, data is changing every 18 months when you have a new technology node, and you pretty much have to start from scratch. If you move from one node to another, what can be re-used and what do you have to throw away? When it comes to algorithm development, you may have a clearer understanding because somebody wrote it. When it was trained on data, it becomes a little fuzzy. I don’t know whether the training that I’ve done before should be thrown away and replaced, or if I can I re-use some of that. These answers don’t exist yet.
Consider a routing rule. This rule is created by a person who has studied enough manufacturing data and figured out that this particular physical construct is very bad for yield, or it causes a short. These rules have a lot of margin. Margins are everywhere, be it in timing or manufacturing. These allow you to account for some inaccuracies in the process, because every piece of the flow can only model things to a certain accuracy level.
In theory, this entire space could be modeled through machine learning and be entirely data-driven. “It would probably reduce margins and improve the overall process, but this would require that everybody works together, starting with the foundries, Andersen said. “And foundries guard their data. They don’t even give it to our customers, and our customers would have to share that data with us. We are not living in the world of Facebook or Google, where all the data that you are mining is publicly available.”
The future
Humans have preconceived notions and biases. They may think they know something because that’s what they learned last time, but it could be wrong. And that may not be applicable anymore. Also, humans generally cannot process large amounts of data and understand the context of it. There are places where humans are better, and there are places where the machine algorithms can outdo humans.
Trying to predict the critical path in the synthesis space, or in the placement space, versus what a critical path will be at the end of the flow is an unsolved problem. “If I were to introduce a machine learning algorithm, I’m not replacing something that works extremely well today,” Andersen said. “I’m trying to do better than what exists today, which is nothing, so technically I can’t do worse. Because of that, we primarily focus on using machine learning algorithms to predict things that are extremely difficult to solve today with heuristics. I’m a strong believer that other techniques, like reinforcement learning, have a good place in the EDA world.”
Whether ML will be used to solve pure analytical problems isn’t clear. Data access and the speed with which data changes may limit its usefulness there. But it appears that reinforcement learning will play a significant role going forward.
All you need to apply AI/ML to EDA is a fast analog simulator an accurate extraction tool, and a software description of the required behavior – preferably something like a neural-network (and not RTL).
The extraction tool is probably the only tricky piece if you want to do it with open-source, but you can probably train an AI to do that job.
http://parallel.cc
https://xyce.sandia.gov/