EDA Challenges Machine Learning

Many tasks in EDA could be perfect targets for machine learning, except for the lack of training data. What might change to fix that?

popularity

Over the past few years, machine learning (ML) has evolved from an interesting new approach that allows computers to beat champions at chess and Go, into one that is touted as a panacea for almost everything. While there is clearly a lot of hype surrounding this, it appears that machine learning can produce a better outcome for many tasks in the EDA flow than even the most seasoned architects and designers can generate.

EDA companies have been investing in this technology and some results are being announced. But developers and users appear to be taking it slow for a couple of reasons. First, results are non-deterministic and nobody is quite sure how to assess the risks associated with that. Second, there is a lack of training data. Nobody wants to open the hood on their designs and share that for training purposes.

“Machine learning is going to provide the most value in a design flow where the task is complex but besieged by data,” says Harnhua Ng, CEO of Plunify. “If the problem can be defined or solved in a single deterministic formula, the problem would have already been solved by the industry.”

In the past, the industry has tackled the problem by breaking it into smaller tasks. “We spend a lot of time crafting what needs to go into a recipe to make a chip,” explains Rob Knoth, product management director in the Digital & Signoff Group at Cadence. “That includes different timing closure recipes, placement constraints, floorplanning, and understanding the gory details about electromigration (EM) rules for specific processes. There are a lot of variables and there are a lot of experts involved in this. Almost every company, be they big or small, has resident gurus who are the experts in floorplanning or timing or power convergence.”

Adding changes into a working flow with that kind of resident expertise doesn’t happen overnight, however. It is expensive, highly disruptive, and results need to be guaranteed before chipmakers are willing to commit to changing their flows.

Non-determinism
Applying machine learning to EDA problems doesn’t necessarily yield expected results immediately, either.

“The most difficult issue is the non-deterministic output of the ML application systems,” observes Norman Chang, chief technologist for semiconductor business unit of ANSYS. “It does not always guarantee highly accurate results, especially if there is not enough data in the beginning.”

To understand the issue we have to explore the details. “Chip designers need accurate data to design and verify chips, and basic ML methods (e.g. taking some data, building a regression model, then pulling data from the model) have accuracy issues that preclude them from use for most chip design tasks,” explains Jeff Dyck, vice president of technical operations for Solido (recently acquired by Siemens). “The reason is that data predicted from the machine-learning model has some amount of error. It might be tiny, and using it in one case might work well and save considerable schedule and resources compared to brute-force analysis. In another case, the prediction error might be really large and could lead to a respin that costs several orders of magnitude more than the time saved by using machine learning. Since design decisions need to be made on reliable data for chips to work, uncertainty in the level of accuracy makes designers reject machine learning methods and move back to more brute-force techniques, even though they may take way longer to run and cover less of the design space.”

To close the gap on non-determinism requires more data, which leads to the next problem.

Gathering sufficient data
To find the most suitable ML approach requires an examination of techniques that could be used.

“The most common ML methods today are known as supervised learning, and these approaches are based on large amounts of data comprising known good design points,” says Ty Garibay, CTO at ArterisIP. “They are unlikely to be very useful for chip designers because the details of what defines ‘good’ change so rapidly from one semiconductor process node to the next, from one company to another, and even from one type of design to another (i.e. CPU, SoC, FPGA, analog, etc.). The second category of machine learning, known as unsupervised learning, is also unlikely to be directly useful, as its primary utility is in identifying unknown similarities in massive groups of elements. It is, however, very likely that unsupervised methods can be applied effectively to improving manufacturing yield.”

Thankfully, there is another alternative. “It is the third branch of machine learning that is most useful for IC designers, as it mimics what we do every day,” continues Garibay. “Known as reinforcement learning, this process learns by creating output, analyzing the results based on various metrics, recommending changes intended to improve the results, and then going around the loop again.”


Fig 1: Types of machine learning. Source: Semiconductor Engineering

This sequence will sound very familiar to engineers who work to make their chips converge to the correct implementation of a design rule that also meets frequency, area, power and performance goals. “If, or more accurately, when a design methodology is encapsulated into a closed loop, such that the methodology can be allowed to run many iterations freely, reinforcement learning can autonomously identify optimal solutions that were unlikely to have occurred to human design engineers,” adds Garibay. “Once this machine learning characterization loop is completed, the limiting factor will remain the same as it is today—the cycle time of the design implementation methodology.”

It will be critical to be able to vary the compute effort versus quality of results (QoR), such that the iterative analysis can run a very large number of iterations early in the learning process, exploring a very wide design space. Once promising local solutions are identified, accuracy and QoR can be cranked up at the expense of compute power, allowing the methodology to search for locally optimal solutions. Good predictors of QoR will be critical to reducing overall SoC realization time.

This may suggest that the training should be done by the end customer rather than by the EDA industry for some areas. “Within Cadence, we split that problem into two dimensions,” says Knoth. “There is machine learning inside, which means there is no change in the human interface, or there is machine learning outside, which is a more transformational change in how the human is going to interact with the process.”

Knoth believes there is a place for both. “Machine learning outside is about finding out what your expert is doing and expediting the time it takes them to close timing, or between architecture freeze and signoff, or how to design the power grid. This is human-machine interaction. Then there is machine learning inside. Even though there is a scarcity of node specific data, there is enough that we can leverage based upon what a good design looks like rather than a bad design, node-to-node and architecture-to-architecture.”

Solido’s Dyck lays out the strategy that they have been following around three things:

1. Adaptive learning: The algorithm must be able to actively target areas of interest, areas of high uncertainty, and areas where behaviors are shifting, and to fill in data in these areas to actively boost accuracy. The goal is to automatically meet the designer’s desired level of accuracy in all areas of interest.
2. Accuracy-awareness: If the designer understands that an answer is accurate within a known amount, then they can add an appropriate amount of margin. Any successful algorithms must be able to predict confidence intervals or have some other way of measuring accuracy of predictions.
3. Verifiability: The machine learning data must be in some way verifiable to prove the machine learning model is doing the right thing.

Dyck believes that if designers have sufficiently accurate results, the level of accuracy is known, and the machine learning predictions are verifiable, then they can make dependable engineering decisions—and they can defend those decisions in design reviews. However, that may preclude the techniques being used on some of the more front end tasks where the level of error in predictors remains large.

“Creating these kinds of solutions takes a lot of work,” adds Dyck. “Not just on the algorithm development side, but on the user experience side in order to make it clear to designers how accurate results are and to clearly show that the results are verified in a way they can understand. Many machine learning initiatives fail because of not realizing the large scope of getting all of this right.”

Advanced nodes or older nodes
Design rules and other new variables are making advanced nodes more complex. Dyck provides some examples including the wider voltage domain for finFETs, meaning you have to look at 7-10 voltages rather than 3 for legacy nodes, the added back-biasing for FD-SOI, and multiple extraction conditions caused by double patterning, among other things. “A typical process/voltage/temperature (PVT) space at legacy nodes might include just 3 process conditions, 3 voltages, and 3 temperatures, while at modern nodes, it’s common to run 50 to 100 PVT and extraction conditions,” says Dyck.

Unfortunately, the advanced nodes also lack data. “Using a cost function based on power or total energy, one can use machine learning trained on existing designs on the current node to optimize new designs on the same node,” asserts Lucio Lanza, managing director of Lanza techVentures. “While, a new node presents different physics constraints and may have a new structure, a new node is developed based on our understanding of older nodes. This means there is information in designs from previous nodes that can guide design in the latest node. Using what is called transfer learning, we can take machine learning models trained on previous nodes and use them as a starting point for building models for the new node. This would mean that one would not need to build new machine learning models from scratch for each node. They can start from a position of experience and this can be transferred.”

This is in line with the way that engineers work. “A good designer is a good designer at 65nm, at 28nm, and I guarantee they will be a good designer at 5nm,” says Knoth. “They are good because they have learned how to adapt, to extract what is changing and identify the new problems presented by a new technology or architecture or whatever is changing. They look at what is new and understand how it relates to their current bag of tricks. That is the role that machine learning is looking to implement.”

What may change to enable greater access to data? “As SoCs got more complex and more IP reuse happened, the industry had to grapple with the fact that there are multi-way NDAs, private and public cloud etc.,” points out Knoth. “This is just another facet of the problem. If the reward is large enough, the industry will find productive ways to collaborate in a secure fashion. Nobody will give up their crown jewels or their IP, but there is a need to accelerate the pace to which you can yield on a new node. That reward is substantial.”

Results from machine learning, of course, are only as good as the data used to create a design. And to be truly effective, it has to include the design in context of use cases and systems.

“There is a tremendous movement by academia to look at implementation, where you develop software but also put hooks into the hardware,” said Gordon Cooper, product marketing manager in Synopsys‘ Embedded Vision Processor family. “The semiconductor company has to build the chip, but we’re just the beginning of the pipeline.”

Early success
Foundries clearly have the necessary data and yield is a highly profitable area to address. “ML techniques have been used extensively to improve yield through wafer map failure diagnosis, equipment monitoring / tracking / diagnosis, and process optimization,” says ANSYS’ Chang. “While it takes many Monte Carlo simulations to address variability of processes or design attributes and can be very time consuming, ML techniques have been proven to predict the sample inputs needed to achieve the prediction of output variability and reduce the run time.”

It also is attracting interest in the design flow. “ML can be used to generate the interconnect for a new SoC without any intervention from a design engineer, other than setting the design goals and creating the initial physical floorplan—assuming, of course, that a solution is possible,” says Garibay. “The iterating design methodology can generate many possible interconnect candidates, implement them through to a layout rule correct design, analyze the results and make changes, looping until a solution is found or the algorithm decides that further work will not be useful without changing input parameters. It is likely that design engineers will then continue to optimize, but for many applications the algorithmically created design will be sufficient for production.”

Routing is another area that is gaining traction. “We have announced working code for routing,” says Knoth. “We improved what was happening under the hood and showed, compared to a build of our tool that did not have this capability, a 12% total negative slack (TNS) reduction. There is a lot of information that we can garner from internal regressions and working with some of our lead partners.”

What works for chips may also work for FPGAs. “We recently trained a database to determine if an FPGA placement is good or bad by analyzing various aspects, including congestion, delays, position and logic elements,” says Plunify’s Ng. “If a trained database can first assess the various placement maps to infer its results before letting the software proceed to routing, it will decrease the runtime. Taking it one step further, if a designer can pre-generate millions of placement maps using this database before running the P&R, they can save a massive amount of compilation time.”

The future
People are looking at other areas in which machine learning may provide value. “Electronic system design, verification and project management data could be leveraged to improve the overall workflow efficiency or manage project risks,” points out Raik Brinkmann, president and CEO of OneSpin Solutions. “This would require the collection and consolidation of data along the design process across multiple tools. On a smaller scale, the effectiveness of individual methods could be further improved by combining advanced data analytics with new verification workflows, such as formal verification methods. In particular, gathering performance data during runtime over several episodes allows building predictive models for fine-tuning heuristics or projecting tool runtimes and verification results. Although the underlying predictive models will be specific to the task at hand, they share the requirement for good training data during their construction.”

New techniques are being constantly being developed. “An approach called Capsule Networks has been recently proposed by Geoff Hinton, who is known as the father of Deep Learning,” says Manadher Kharroubi, lead software architect for ArterisIP. “Capsules introduce a new building block that can be used in deep learning to better model hierarchical relationships inside of internal knowledge representation of a neural network. This capability is exactly what human beings and existing EDA tools leverage to analyze extremely complex designs from the bottom up. While processing an entire SoC may always be a challenging task, it is very likely that the type of partitioning that human engineers use to make the vast seas of data comprehensible will lead to a number of locally optimal solutions that, when combined, yield a near-optimal global result.”

Conclusions
Machine learning delivers the most value when data acquisition is long and expensive, and when there is a lot of data to learn from and to predict. “We see that perfect storm happening at modern nodes, and therefore, we see the biggest benefits from our machine learning methods occurring at bleeding edge processes,” says Dyck.

Those improvements will help everyone, not just those at the latest nodes. Will ML replace the experts? “They are always going to be a key part of the process to drive real innovation on the cutting edge,” says Knoth. “The designer still has a huge impact on the quality of the end product. It will always be human ingenuity that adds to end value. Smarter machine learning algorithms can help you get the job done better, but it is still the craftsmanship that an engineer applies to the part which will make the real difference.”

Related Stories
Using Machine Learning In EDA
This approach can make designs better and less expensive, but it will require a huge amount of work and more sharing of data.
Machine Learning Meets IC Design
There are multiple layers in which machine learning can help with the creation of semiconductors, but getting there is not as simple as for other application areas.
CCIX Enables Machine Learning
The mundane aspects of a system can make or break a solution, and interfaces often define what is possible.
Machine Learning Popularity Grows
After two decades of experimentation, the semiconductor industry is scrambling to embrace this approach.



  • Kev

    ML needs two things for EDA – good extraction tool and a fast (analog) simulator – so you can create circuits and check if they work.

    Designing at 5nm is also a different ballgame – high variability in the the manufacturing process means that you want to move off synchronous logic onto asynchronous, and drop RTL in favor of data-flow (async.) specs.

    The inefficiency of the current design processes would seem to say EDA is ripe for disruption – AI & (semi-) formal methods could wipe out Cadence and Synopsys quite quickly.

    I’d start here –

    http://0a.io/boolean-satisfiability-problem-or-sat-in-5-minutes/

    [+ Solido are only an ML company in their own marketing blurb]