Why designing chips is so energy-inefficient, and how that impacts development cost and time to market.
Chip designs are optimized for lower cost, better performance, or lower power. The same cannot be said about verification, where today very little effort is spent on reducing execution cost, run time, or power consumption. Admittedly, one is a per unit cost while the other is a development cost, but could the industry be doing more to make development greener?
It can take days for regressions suites to run, and regression testing is a continuous activity. Every time a change is made in a design, all previously developed tests are rerun. Every time a bug is found, a new test case is developed to target that problem. For some designs, those test suites continue to grow for years, possibly decades and little, if any, optimization is done.
The big question is whether all those tests are useful. Could some of them be removed without fear of a bug escape? And if so, how much time, cost, and energy be saved? As the chip industry focuses on more efficient chips, it also is starting to take stock of the efficiency of its own processes.
At the recent Design Automation Conference, there were two panels about verification. One was entitled “Design verification engineer 2.0 — a new generation or a pipe dream,” and the other “Handling SoC verification: Changing the paradigm in verification approaches.” On the surface it appears as if they had a different focus, but both were looking at how verification may change in the future — particularly how to get rid of useless test cases that cost time, money, and power to endlessly rerun. In one of the panels this was even named Green DV, indicating there is an ethical responsibility to compress regression runs.
“We never seem to retire any verification test benches,” observed Balachandran Rajendran, CTO for EDA/Semi and unstructured data solutions at Dell EMC. “Once it gets there, it’s always there. I did some math. Suppose somebody has one test case that runs for one hour. If, as part of a regression suite, that is run each day for 365 days on AWS, that translates to about $70,000. Engineers need to become smarter about selecting test cases and test vectors.”
This is more than just an economic equation. Data centers are now being targeted for wasteful practices, and in some cases they are limited by the amount of electricity at any particular time, both because it is unavailable and because there is an environmental impact from creating that energy. But in the case of chip design, there are plenty of areas for improving efficiency on the verification side.
“No one wants to retire an old test,” said Neil Hand, director of marketing for IC verification solutions at Siemens EDA. “Maybe you’ve reintroduced a bug escaping in the design, so the test suites just get bigger and bigger and bigger, and when you start to run 100,000 regression tests over a weekend, these are big, expensive runs to do.”
One of the keys to improving efficiency is reducing the time spent on these tests. “If we can get to where we are today, but a hundred times faster, you actually can scale your operations,” said Aman Joshi, senior director for design enablement at Western Digital. “Maybe you can do 10 times more, make bigger chips, and get things done faster.”
That would probably interest all companies. “What if verification was instantaneous?” asked Dell/EMC’s Balachandran. “What if your test cases run instantaneously? Would innovation be much faster? Of course. Design would be faster. Everything would be much faster.”
But at what cost and environmental impact?
Growing verification problems
The verification profession has transformed itself several times over the years, and continues to look for ways to change and improve. It has deployed abstraction, utilized multiple models of computation, developed several execution engines, and most recently it has been looking at ways to inject machine learning (ML) into the flow.
Many people, both in industry and EDA, are hoping that ML may be able to help, but there are also concerns about that. “It is not acceptable if you need a nuclear power plant worth of power for six months to train your model,” said Dwayne Pryor, a verification consultant. “If you look at BERT, or some of these NLP models, these are on the scale of complexity for understanding a design. You can’t afford the cost or wait six months to retrain your model.”
This problem is exacerbated in chip design because data goes stale very quickly. Every time the design changes, much of the verification data that has been collected is no longer valid. Regressions are run to see if that design change caused additional issues. But a design change invalidates the verification results, and it also may invalidate any analysis done about the effectiveness of a particular test case.
Another big problem with verification is that nobody has found a definitive way to define completion. The creation of cover points is a manual process, which is a proxy for actual design space coverage. It is impossible to think that the entirety of the design space can be covered, and so it requires experts to prioritize coverage. Coverage allows a team to know if they are making progress, but can it ever be an effective ML target?
There is a further challenge that verification is grappling with, namely the rapid expansion of its role. Today, verification goes well beyond determining whether the design behaves as expected. It now addresses whether it performs within desired power and performance envelopes, whether it continues to operate effectively after the circuitry has aged, and whether it fails correctly — or even whether failures will be detected. The list goes on.
“Functional verification is pretty easy in a sense,” said Siemens’ Hand. “Does the design do what it’s supposed to do? When you start getting into some of these other areas, such as functional safety, you have to ask the question, ‘Does it fail correctly?’ Is it going to behave the way it should? No matter if it is low power, whether it’s performance, whether it’s functional safety, each one of these adds another aspect of verification and a huge amount of extra work.”
There are a number of ways in which the industry is attempting to tackle the problem of verification efficiency. These include selecting the most likely tests to catch a problem, creating better test cases, defining better closure metrics, and a reduction in the likelihood of bug escapes.
Not every test, every time
Some companies have been looking at indirect information to determine where problems are likely to be when a design change is checked in. “Assume that Bob has just checked in a change,” said Pryor. “But this is something that’s in Sheila’s domain of expertise. Bob has never touched Sheila’s code before. The regression blew up. It’s almost certainly Bob’s fault. The point is that there are data sources that are not the design, and the design is, in some sense, the most difficult to deal with.”
Some progress is being made that is design-related. “We need to apply more intelligent learning to the analysis of test benches,” said Hand. “We should be able to identify which types of tests are the ones that are most important, or which are related to the part of the design that has changed. Then you’re not trying to verify everything every single time. You’re trying to look at the deltas between different versions. You also can apply that to IP management. If you know a configuration of an IP and you can identify that it hasn’t changed, you don’t need to do as thorough testing. You still need to do testing, but the idea is to try to eliminate as much of the unnecessary testing as possible.”
This is a non-trivial undertaking. “The greatest risk is often in a legacy part,” warned Mike Chin, principal software engineer and validation architect at Intel. “You will find the bug that escaped four generations of usage. We have better throughput, we have better methodologies, we have more thoroughness in our ability to execute and cover today, but legacy doesn’t mean a thing. We really should be analyzing the whole of this. The moment somebody wants to use a particular feature in a different way than a use case that you previously defined — that is a risk. It is a risk to our silicon because we haven’t covered it yet.”
Perhaps ML could provide a better ordering of tests. “We’ve shown that with reinforcement learning, you can cut down simulation cycles and regression time by 86% in one particular example,” said Sandeep Srinivasan, founder and CEO for VerifAI. “It is not always going to get such high percentage reduction, but think about how DV engineers write tests or coverage points. A lot of them can be overlapping Venn diagrams, where one test gives you no information that is meaningful, while five tests give you 99% of the information you need. From a machine learning perspective, you can recognize this very quickly, because there is no information gain, and when that happens, you stop compute, meaning you reduce the number of simulation cycles.”
The problem is that tests may be covering things that are not included in the coverage points. “It is a multi-dimensional problem,” said Chin. “But if we can figure out how to cover right, then we stand a chance of being able to retire tests, because we can see quantitatively there is data proving that we’ve already covered this particular case. AI/ML is an augmentation of what we already do, but we really have to figure out how to balance the two. We must continue to use our internal smarts, and use some sort of static analysis, which may be ML or algorithmic. Using both will help us round out that picture and figure out exactly what we need help with.”
One does not replace the other. “We should strive to find ways in which ML can improve our efficiency and make us more productive,” said Olivera Stojanovic, senior verification manager for Vtool. “We should not expect ML to do work instead of us, because then we will end up being very disappointed. We can use ML to improve our speed, to facilitate debugging, to help us not debug the same issue twice — that will be enough for the beginning.”
Better closure
One of the big problems with verification is the definition of coverage metrics. “At the end of the day, it is all about how many bugs escape into silicon,” said Chin. “There is a gap between the intent that we express, and how we measure and account for all the corner cases in the design. That is where the bugs hide. These are the bugs that my customers, or my end users are going to find.”
Verification is a risk mitigation activity. “Ultimately, that’s all it is,” agreed Hand. “We’re trying to minimize that risk. You can fully cover a design and still test nothing that’s important. Coverage is only as good as the cover points you insert. The coverage methodology we have relies on a person putting the coverage in. And the challenge is if that person misses a point, misses a critical area, you have no clue. You’ve got full coverage, but you’ve missed a critical piece of infrastructure and the design.”
Cover points do not equate to actual coverage. “It is simply unacceptable to say we came up with some coverage metric that we think is right,” said Adnan Hamid, executive president and CTO for Breker Verification. “We have to have something flexible enough that we can put in detail where we think the risks are, and we may be shallower where the risks aren’t as high.”
Even closure is a relative term. “Done is an undefinable problem,” said Pryor. “If I had perfect formal verification, I still don’t know that my spec is right. This problem has one property that is really well-suited to ML, which is that I don’t need to be perfect. I need to be better than some version of chance. I need to be better than directed random at finding non-overlapping interesting things. With reinforcement learning, the reward is how much new information a test provides, so there’s some entropy reward that you get out of the test.
There may be better ways to define coverage points. “Even if you don’t have infinite data, we still have enough data,” said Sashi Obilisetty, group director for R&D at Synopsys. “Coupled with a domain expert, it’s possible for end users to build interesting models. We could probably build an NLP based SVA generator or something that utilizes the specification. You can do classification of log files. You can do interesting things. Process is important.”
Reduction of effort
The desire to improve data usage is there. “We have been doing data collection for several years,” said Western Digital’s Joshi. “Initially this was used to optimize compute, to optimize licensing, to optimize resources. Now we are doing a lot of research and partnerships to see if we can actually move to the next generation, which is finding the bugs faster, getting to coverage faster.”
That allows for greater focus. “You want to do as little verification as possible to achieve the goals that you need to achieve,” said Breker’s Hamid. “If you can re-use the work, people can start to focus on how to find that one bug that’s going to bite me a few months from now when we tape out.”
Industry cooperation may be necessary, and that means sharing of data. “If industry can partner with each other, it will be a mutually beneficial thing,” said Joshi. “You want to build enough databases that are not proprietary, but enough for meaningful exploration. There are consortiums that are sharing data about coverage, so they will increase the amount of data and increase the opportunity for models. The RISC-V community has donated many designs. We have committed designs to that. That could become a very good test case, and you can synthesize more like that from it.”
It also can combine tools. “With RISC-V we have been improving coverage on the data cache from using the Google instruction generator,” said VerifAI’s Srinivasan. “You can keep running this forever, and only reach a particular coverage point. It needs some smart techniques, like reinforcement learning, and then you can figure out which of the sequences of instructions really mattered to optimize a given output. In this case, improve coverage on the data cache. This goes back to the large amount of data, which is better than constrained random because you’re now learning the saliency of which inputs really matter to optimize a particular outcome, rather than shooting darts.”
Conclusion
Verification is highly wasteful today. Constrained random allowed the industry to capitalize on scarce verification engineer resources by having them creating models rather than individual test cases, then automatically generating tests from those models.
But those generated tests are inefficient, because the effectiveness of them to find bugs is at least partially unknown. New tools are attempting to understand the connection between the design space covered and testcases, and this should lead to better efficiency — so long as the cost to run those tools is less than the improvements made.
Leave a Reply