Taking The Mystery Out Of IP, SoC And Automotive System Debug

What’s wrong with my car, and why is that relevant to semiconductor design?


I recently had to take my car to the dealership because the gas-saving “auto-shut-off-while-stopped” feature wasn’t working. The dealer explained that the reason it took two days to debug was because it “touched on many systems” in the car. In the end, they realized the battery wasn’t fully charged and blamed it on my short 10-mile commute.

Whether that was an honest answer or an example of low-power design gone bad, can be discussed over a beer. The more interesting point is how complicated the modern car has now become, and how difficult it is to find a faulty part – in an otherwise known-good system. Now imagine if they’re trying to debug the same system, and many parts thought to be good are failing due to hardware and/or software design errors. Oh, and your debugging on a system running 1-millionth of real-time speed. Now you know what it’s like to be an SoC verification engineer…

Looking back at my rapidly-depreciating car, we start to understand why mechanics are referred to as “factory-trained technical specialists.” They have their toolbox full of diagnostic equipment (OBD-II reader, volt meter, diagnostic manual) and built-in diagnostics software that hopefully identifies faults. Verification engineers also have a toolbox (simulator, test bench, probes, crystal ball, Gentleman Jack) – only they don’t have the benefit of an otherwise known-working system, much less in-depth training and built-in diagnostics. That’s the real challenge facing verification engineers – a lack of deep design knowledge coupled with a lack of visibility into the design.

So how do we fix this? Well, training can help a little … but today’s SoC’s are a conglomeration of many, many specialized IPs. No one can be expected to become expert on all of them. What about built-in diagnostics? That can help – but only after the SoC is somewhat functional, and only after the tests are written. And unfortunately, unlike the car, we aren’t looking for simple faults. We’re looking for corner-case bugs that typically no one thought to test for. So diagnostics can help, but they aren’t the answer.

So where do we go now? We need a built-in solution that combines design knowledge with visibility that can automatically detect when things go wrong – even if we don’t know what wrong looks like.

The good news is: we know what right looks like and from that we can infer what wrong looks like. That leaves design visibility. Fortunately, that technology – assertions – has been around for a long time.

Unfortunately, it falls short. Assertions are typically created by hand and as such, even when created by the experts (if only they had the time), those experts only focus on what they think might fail, leaving all those corner case bugs no one thought to check for, to rise to the surface at the worst possible time (usually right before annual bonuses are calculated). The lack of comprehensive assertions has in fact been the downfall of “assertion-based verification”.

Finally, there’s a new kid on the block that can help: automated assertion synthesis. Putting it all together, we can build a methodology that works as follows:

  • Capture the known-good operation of the design. Here we monitor the behavior of the individual IP’s as they are individually simulated, and describe this behavior in the form of properties. If IP-level verification was reasonably comprehensive, these properties represent the “known-good” state-space of the design.
  • Use the “known-good” to infer the “known-bad” state space. Here’s the clever part: we effectively invert each property, synthesizing an assertion. Collectively the assertions represent any state outside the known-good state.
  • Embed the automatically synthesized assertions. By synthesizing IP-level assertions using SystemVerilog Assertions, and bundling those with the IP’s RTL, we can effectively create the “executable specification” we’ve all dreamed of. We’ve now captured the designer’s knowledge and embedded it into the design itself.
  • Repeat for each IP under development (or not). We would repeat step 3 for each IP under development. Where possible, we can do the same for any legacy or 3rd-party IP as well. Have you ever wondered how well verified those parts are? Now you can find out…
  • Automatically detect configuration errors, IP-coverage holes, and bugs. Here’s where the methodology starts to shine. When running SoC-level simulation, so long as each IP operates within its known-good state space, none of the assertions we’ve embedded will fire. This means that up to this point, the entire process has been fully automated. No engineer need touch anything!

But, what happens if the design steps outside the “known-good” state space? That means we need to investigate. Typically one of 3 situations arises:

  1. We’ve incorrectly configured the IP. This can be caused by a design integration error, or even a software (aka firmware) error. Either way, it will be flagged automatically. These are the most common, and include flipping interface wires, incorrect handshaking, and unsupported modes. The good news is all of these can be debugged by the SoC integration team without having to involve the IP developers.
  2. We’ve found an IP-level verification coverage hole. This means we found a corner case scenario that was never thought of. It doesn’t mean it’s a bug, but it does mean we should ping the IP-design team to be sure. That brings us to,
  3. Corner case bug. Some of those coverage holes can be outright bugs, and this is the real value of this flow.

So there you have it…a new methodology that automatically detects coverage holes, configuration errors and design bugs. And though the methodology may not give you a free cup of coffee while you wait for it to detect a problem, you won’t be charged $1,000 every time it does.