Verification And The IoT

Experts at the Table, part 2: What is good enough, and when do you know you’re there?


Semiconductor Engineering sat down to discuss what impact the IoT will have on the design cycle, with Christopher Lawless, director of external customer acceleration in Intel‘s Software Services Group; David Lacey, design and verification technologist at Hewlett Packard Enterprise; Jim Hogan, managing partner at Vista Ventures; Frank Schirrmeister, senior group director for product management in the System & Verification Group at Cadence. What follows are excerpts of that conversation. To view part one, click here.

SE: Can we continue to divide and conquer on verification as systems become more complex and interconnected?

Lacey: You really have to look at the verification problem from multiple levels. You’re starting at the IP level and making sure it’s functional, according to the spec. And then you integrate it and compound the challenges of going from all of these different components into an SoC. The seams are really where the challenges are. And how those components interact really begins to create an exponential issue. That’s when you really have to start looking at it from a system-level focus.

SE: Can you define what you mean by system-level?

Lacey: System level really comprehends an entire SoC, as well as the OS and typically the workload that is going to run on top of that system. That becomes even more complex when you take an SoC and put it into a customer system, like a server. There is a broader definition there, because there are many more components. At that point, it becomes a much more challenging exponential issue. Then you have to start narrowing it by focusing on the use cases. How will this be used? What software is going to be run? What’s expected from this system. Only then can you get your verification arms around what you’re going to focus in on to maintain that quality.

Lawless: We take a very similar approach, focusing on IP first and making sure it’s working as intended. That gives us lots of controllability and much faster environments to run cycles through. But for us, another big part is verification planning. We have features in place. What is our plan of attack for how we are going to validate those features? It involves not just making sure the features work, but also how they react in terms of stimulus that may come in that’s unexpected. We have defined behaviors that we expect in those scenarios. But we’re not necessarily going to be able to think of every possible scenario. So we still rely on constrained random and volume regressions to explore around those areas and find out where our hardware may not be operating like we intended.

Lacey: That’s a good point. If you plan the spec, the architect defines it, you design the silicon to imbue those specs within the silicon. You think you’ve got everything done, and then along comes the software layer. It uses things in exercises in ways you never thought possible. That’s where the multiple layers in the system approach are so critical. You can random test for every hole in silicon, and a lot of that still goes on, but you also need to look at it from a system approach. When you combine all of these parts into one, you can’t do that same level of testing in a tester unless you get the right level of stimulus. That comes from how apps are going to utilize and take advantage of the silicon.

SE: But how do you do that if you don’t have all of the IP that goes in there, particularly analog? Plus, the software gets updated over the life of a product. It’s connected to devices you never expected. And if some devices are supposed to last 10 years, the security you’ve built into it is probably outdated after a few years.

Hogan: First of all, I’d like to abandon the term Internet of Things and call it Integration of Things. That’s a more accurate name for it. We have about 40 people who work on things. I was asked by the DoD to come up with a statement of work for those guys. What they wanted to do was build an engine that can learn acceptable behavior, and then respond to behavior I see—a stimulus—that is outside of the behavior of the norm I expect. Then, I have to come up with some action. Typically, the action is to shut it down and do a hard reboot. But there are going to be on-chip verification engines that get shipped with the chip. In other words, we have to have the capability of learning and retaining that learning, and then utilizing that learning to deal with threats that morph and evolve over the lifetime of a chip. This is especially true with the DoD.

Lacey: Software tends to be our friend when it comes to hardware because we always put extra capabilities and debug engines in our chips, and if we find a problem when we get our silicon back, often that problem becomes a feature that we utilize software to work around. While that happens most of the time when we get our silicon back in the lab before we ship our product, if we find something in the field when a customer is using it, we can use similar approaches. The software then can provide that upgrade path. The other aspect of software that can help with our products is that many of our chips now include embedded processors. So software is a part of our chips, and it makes it an easy path for us to provide upgrades and extra features or adapt to the changing environments if these devices need to live longer.

Lawless: We were working on a project involving smart phones. The way smartphones deal with problems is software filters. I had to back out 14 version of the OS to find one that didn’t have a software filter. Software extends the platform, and it’s a great way to do that, but it comes at a cost. It costs you energy. Sooner or later you want to get that into the hardware.

Schirrmeister: One of the interesting issues here is defining the safe point. So what happens if something bad occurs? The safe shutdown is very different for a phone or an airplane or an airbag. It’s very crucial to have that definition, and it’s increasingly more complex when you plug it into the system environment.

SE: Does the definition of what is good enough before sign-off change here?

Lawless: We don’t love that term because it implies lower quality. But from a vendor perspective, you’ve got an array of solutions. Each one has a particular purpose and tolerance. You’ve got to understand where those fit. You can validate until the cows come home, but you would miss the window and you would never make any money because you couldn’t get it out the door. There is a tradeoff that has to happen. We’re really trying to understand the use models and meet the quality expectations of customers—and even overachieving wherever possible. The term isn’t necessarily ‘good enough.’ It’s ‘appropriate’ for every application that’s out there. But those will vary.

Lacey: A lot of it is about risk tolerance. You take a risk and maybe ship it out early. We might take a few more risks on a pre-silicon validation and hopefully get things into the lab and maybe have the chance to get a product out earlier. We don’t ever look at things as good enough. We still want to deliver high quality. But it also comes down to the different features. If we have a node controller chip where the coherency doesn’t work, we cannot ship that. It has to work. But there may be logging features for errors to help us debug and understand what may have gone wrong in a huge, complex system. If that doesn’t work exactly like we wanted it to, we may be okay with sending that out as long as we understand how that’s going to impact debugging capabilities. It does vary depending on the feature.

Schirrmeister: If you look at verification management, ‘good enough’ is to have something coming back that can be made workable with software. It becomes a feature, and with software and a working set of errata you can make the chip work. In the verification plan, you have a confidence level. You never have 100% confidence. But have you covered all of the items for what you’re taping out? That includes things like the power needs to work. That’s why people run power in emulation. It needs to boot the software. It needs to run the basic test. That’s why people bring OpenGL forward into the silicon phase. It’s becoming a huge verification planning challenge. And then there’s this moment of truth when you’re ready to tape out, but you know you haven’t covered all the bases. You just have covered enough to get to production.

Hogan: It’s really down to what is your confidence level and what are you willing to tolerate. That’s a statistical-based concept that has powered our industry for decades. But that’s brute force. What’s acceptable? Two-sigma? Four-sigma? Seven-sigma? It depends on the application. The good news is the industry is developing deep learning and machine learning engines. Those are great platforms to figure these things out. Wherever we can deploy deep learning, it takes us beyond the statistical realm and allows us to have a higher confidence level so we can make an informed decision. We may be able to skip ‘this’ and catch it in a software filter to extend the platform. But what kind of risk are we taking? That’s one of the most exciting things in the next year or two. You’ll start seeing deep learning concepts coming into the verification tools and suites. We’re deploying them on a number of projects already.

Related Stories
Verification And The IoT (Part 1)
Application-specific verification, and why quality may vary from one market to the next; why different models are ready at different times.
Rethinking Verification For Cars
Second of two parts: Why economies of scale don’t work in safety-critical markets.
2017: Tool And Methodology Shifts (Part 2)
System definition to drive tool development, with big changes expected in functional verification.
System-Level Verification Tackles New Role (Part 2)
Panelists discuss mixed requirements for different types of systems, model discontinuity and the needs for common stimulus and debug.
Grappling With IoT Security
Updating connected devices creates a whole new challenge as threats continue to evolve.