Faster Verification With AI, ML

With engines improving, the design ecosystem is looking to new approaches to improve productivity.

popularity

Tool providers have continually improved the performance, capacity, and memory footprint parameters of functional verification engines over the past decade. Today, although the core anchors are still formal verification, simulation, emulation, and FPGA-based prototyping, a new frontier focusing on the verification fabric itself aims to make better use of these engines including planning, allocation, and metrics tracking.

At the same time, artificial intelligence (AI), big data, and machine learning are top of mind for every design team asking, ‘How do we make verification even more efficient given that all the core engines have improved and continue to improve; what’s the next level? When am I done verifying?’

A panel discussion I moderated at the recent Design Automation Conference in San Francisco discussed these and other topics. The panel included Jeff Ohshima, member of the technology executive team at Toshiba Memory; Paul Cunningham, corporate vice president and general manager for R&D at Cadence; David Lacey, verification scientist at Hewlett Packard Enterprise (HPE); and Jim Hogan, managing partner at Vista Ventures.

Ohshima contrasted the tremendous demand for memory expected from consumer, datacenter and enterprise vertical segments with design and verification issues. “With more design complexity than ever before and compared with the design cycle /verification cycle, it’s taking longer and longer these days. While we are doing the new circuit design, we have to do the verification at the same time to shrink the total cycle time. This is now becoming a big issue for SSD development.”

And while automation has improved development turnaround time, design complexity is still so difficult that more help is needed, Ohshima acknowledged.

Specifically, flash memory design has recently become more complicated, he explained, especially with the company’s transition from 2D to 3D flash. “There are huge numbers of parameters to optimize in the circuit design as well as the layout design—and quickly.” Ohshima says Toshiba Memory uses an agile design methodology to make up the totally mature flash memory as a device. “The large amount of iterations for the circuits or devices parameters must be done to catch up with the quick design cycle, and it must be done with an agile design approach for each portion,” he said.

On the SoC side, Ohshima noted that its SSD controller SoC was not the state of the art with the most advanced technology, but “we are okay, even now, with the 20/16nm technology node; we don’t need 7nm right now. That said, some unique requirements for the SSD controller exist. Compared with the previous generation, the new generation makes reuse of some blocks; of course, the bugs found, detected after the R&D fix, is decreasing thanks to reuse. However, because of the design complexity of the next generation, especially the enterprise application area, we have lots of blocks for the regression simulation that is now becoming a critical path for our SSD design.”

Toshiba Memory hopes many new ideas will flourish in the name of AI or machine learning or deep learning.

Regardless of the type of chip, “[v]erification is a truly intractable problem,” Cunningham reminded, “If we double the size of the chip each year, we are squaring the state specs. There is no way that we’re ever done [squaring the specs]. This is a huge community/ecosystem challenge. What can we do with our tools and our solutions to somehow try to shrink that gap—you cannot possibly square the amount of compute resources [and] headcount every year. That probably isn’t going to go very far.”

“We have all these different engines, all these different ways to go after this space so we can try to cover verification with formal, we can run traditional logic simulation and then we’ve got different hardware architectures. Traditional simulation is running on an Intel x86 Xeon basically, and we can also run a simulation on an Arm-based server now where you can run a simulation using FPGA. Simulation can also be run with a custom processor. With all of these different engines, there’s a larger question of knowing which engine to use, in which case, and how to maximize the overall verification throughput,” he said

“In that capacity,” Cunningham continued, “if you think of this multi-engine, intractable vast space, all these notions of big data and analytics and what can we do with machine learning, artificial intelligence, how do we somehow go after this space more efficiently?”

Since taking over the verification R&D team, he admitted things don’t feel efficient right now. “We are burning millions of cycles in verification without actually making any difference in coverage—or we’re covering the same things multiple times. This is why there is a real opportunity as an ecosystem to try to be more efficient and keep going after this fundamentally intractable problem.”

Lacey offered three major areas that he focused on when looking at the verification technologies used at HPE. “First, we try to make our engineers more productive. If you look at where our costs are, engineers are our most expensive expense; we want to make sure that they’re efficient and productive as they can be. Whether that’s from a tool standpoint, from a methodology standpoint. Second, we spend a lot on tools and licenses, so how do we get better value from those licenses in those tools that we use? Third, how do we achieve more predictability from our projects and how do we go about achieving those goals?”

A couple of the ways HPE has addressed these areas starts with data. “We need data to start with, so we look at various sources of big data and data analysis to be able to drive answers and changes in terms of what we’re doing in order to achieve improvements in those three areas. Then, we also look at the tools and technologies that are available to us. Are we using all the technologies that we have available to us in the licenses that we have? Then, it’s not just a matter of having a feature that’s available. We have to have a smart way to utilize and consume that feature so that as we put it in the hands of the engineers that are working on projects that it adds to their productivity, it doesn’t distract from their productivity.”

For Hogan, his interest is in the investment aspect of verification, so he looks at opportunities in this space from the perspective of how he thinks the companies he is invested in can make money at it. “It’s often the case that you can be too early to market, and I’ve suffered many times with that problem. In other words, putting money into something, and the market’s not ready. What I like about the verification market right now—spoken as an investor; not as a consumer or a tool provider—is that there’s a big hardware sea change, i.e., the cloud, evidenced by the presence of Google and AWS exhibiting at DAC. It’s the first time they’ve ever shown up here and exhibited so they see this as a market that is interested in them. I also see the cloud as playing a big role in allowing us to do things.”

That said, he believes the industry needs to keep its eyes on hardware changes, as well as business model changes.

The sweet spot for each engine
Especially with the range of verification approaches commercially available today, it is pertinent for engineering teams to try to understand exactly where each engine can and should be used, and if that determination is more of an art or a science.

Cunningham suggested it is probably a bit of both art and science, and it’s definitely an ecosystem problem. “Does it depend on the type of chip that you’re doing? I think some vectors that allow us to start to untangle that are, for example, whether you’re verifying at the block level of the IP? At the component level? Are you looking at the SoC level? Are you trying to model real world where maybe you have the software ecosystem or at the software stack? Or are you doing a bad apple type verification? An obvious case would be if the design is starting to be more stable and you’re trying to debug the software that’s working with the design. There you have a different set of requirements that I think fit a lot better for what we call the traditional prototyping, market running with FPGA-based acceleration. If you’re in a case where you’re still really trying to flush out cool functionality problems with the design itself, you have a different set of requirements that might work better with an emulation processor or just a regular logic simulator on an Intel Xeon farm.”

“Those are some examples of the differences, but that overall orchestration: How do you breathe this idea of multi-engine coverage together? Here, there’s still a lot of work for us to do because there’s not any strong tool class in this space,” said Cunningham. “This speaks to the startup situation in the industry. There are not that many spaces left where there isn’t really any decent major solution in the ecosystem, and as far as overall multi-engine coverage management, this is an area that still has a long way to go in the industry.”

Further, Lacey said, the different types of engines that most engineering teams are working with, smarter verification for them is to actually take the bold step to start using some of the other engines. “Even in our group it’s often tough to convince management to do something that we haven’t done before so in some cases that’s going to be the smart choice and so you can use broad guidelines for where to use the different engines and how to get very quick value from them. From that standpoint, it’s certainly a science. You can identify good areas to target the engines for, and get value from them in pretty short order. Once you have that in your flows and you’re using them, you can then take it to the next level to look at how you can really begin to get that next level of value out of the different engines. How do I begin to look at how to remove the overlap of the spaces that those engines are covering? An easy way to do it is to continue to run all your simulations, and just add formal to it. Well that’s great. It’s certainly going to flush out more bugs but from a, ‘Am I getting more value and productivity out? It’s just adding more work to my plate. That’s not helping my schedules,’ that’s the next step that you need to take. How to use formal for some areas, use simulation for some areas, then be able to bring that data together from those different domains and showcase that, ‘Yes, I’ve got a complete job here for the validation that I’m doing.’”

In a way, this leads to the Portable Stimulus discussion, Hogan pointed out. “The fact that you can have smarter testbenches is a big deal. That’s one of the ways to go cross domain. Think about it, if you have all these engines, wouldn’t it be great to have a stimulus or a testbench that could run across it, and are those assertions sufficient to verify what you need to do to prevent you from just simulating endlessly?”

Given the chance, verification engineers would simulation forever, he continued. In the case of autonomous vehicles, this all gets very real. “If the thing goofs up, and it’s a hardware or software problem. It could potentially hurt somebody. So how do we know that we’ve done enough simulation? How do we know that we have the coverage?”

Then, when it comes to adding formal verification to the mix, Lacey observed it is a mixed bag where design teams are concerned. “Some logic engineers are very accustomed to their way of doing things and they don’t want to try new things. Some are certainly more open to new opportunities and the thing that gets them obviously is just finding those use cases that are going to show then the value in very short order.”

Cunningham offered the real-world example of unreachability analysis. “You can use a formal tool to actually reduce your coverage space to say that there are certain parts of the design space that are just not reachable so there’s no point in trying to cover them. Another one—which is not there yet but is very close—are things like reset secrets. There’s a lot of verification work looking at different things around resets but it’s within our grasp to be able to do formal SoC-level reset proof, which can eliminate a lot of problems.”

“Even on a more basic level,” Cunningham said, “there are even static way to check things, even without formal, just with a basic structural analysis. This can be especially helpful in low-power design, as there are still a lot of things with crossings, level shifting, isolation, and so on, where you can actually identify the problems without running any kind of simulation or formal at all even though there may still be duplicated effort sometimes if you go in and review verification methodologies.”

Managing big data

With the variety of verification methods generating a tremendous amount of data, a key to success is knowing how to aggregate, manage and leverage the information.

Another question here concerns what data should be collected, Lacey asserted. “We have the simple stuff that the tools will provide for you. You use your simulator, you’re going to get coverage data out of it most likely, if you turn that on. But what other data is useful? What other data would you want to have access to so that you can understand if you are getting the most value from this tool or not. We’ve spent a lot of time identifying different types of data that we want to collect on a regular basis so that we have access to it, to be able to go back and query and analyze to be able to answer some of those questions that do arise.”

At the same time, this is where AI and machine learning will come into play. “When we talk about AI and machine learning, quite a bit it’s about how we get a lot of data, and what do we do with it? The key thing is figuring out how to get the training data. Identifying training data, using a directed search sort of does the training data. It takes quite a bit of time and effort to identify the data you want to keep, and that training data ends up defining the vertical domain that you want to explore. AI isn’t so much a product as it’s a feature that we need to learn how to deploy. When we get up here next year we’ll talking about AI as a feature that we’re starting to deploy, and then we’ll see little companies emerge that have domain expertise or training expertise.”

Connected with this is the cloud, and where it plays into the storing and processing of data, as it relates to semiconductor verification.

“When we talk about the cloud, where is the expense? It turns out it doesn’t cost you anything to send stuff to the cloud. It costs you a lot to bring things back down from the cloud,” Hogan pointed out. “So, if you have a verification problem, you don’t want to be interacting at your workbench with the cloud. You want programs and applications that do all the work up in the cloud, and just send you back an answer. If you take that economic, then you’ve got a look at opportunities where you can apply that cost model.”

To Lacey, he concluded that working in and/or with the cloud will come down to cost and the productivity that is gained from it. “One of the major marketing ploys for cloud is that If I’m running 5,000 tests at night for my regression and it takes me all night or all day to get that done, I can throw it in the cloud, go as parallel as I want and have it done in 10 minutes or as long as my longest test. That has some very interesting aspects to me of how that can help with engineering productivity. How can that help my engineering teams to get the answers back that quickly but is that what I need for everything that I’m doing, or just for my volume regressions that are searching for bugs? I’m not sure I have the right answer for that, but that’s where cloud comes into play because for us we do have a full private datacenter solution ourselves so it’s not like we’re lacking for compute capacity. For smaller companies and companies that don’t have that, cloud certainly becomes a viable and useful aspect. But for us, it’s now a trade-off between investing in private cloud versus going to a public cloud.”



Leave a Reply


(Note: This name will be displayed publicly)