What tool works best for a specific verification task may be clearer in the marketing literature than in the real world.
Emulation, simulation, FPGA prototyping and formal verification have very specific uses on paper, but the lines are becoming less clear as complexity goes up, more third-party IP is included, and the number of use cases and interactions of connected devices explodes.
Ironically, the lines are blurring not for the most complex SoCs, such as those used in smart phones. The bigger challenge appears to involve less complex IoT applications, and for chips developed at established process nodes that are finding their way into safety-critical applications such as cars and medical devices.
The biggest SoCs are still developed by the largest chipmakers and systems vendors. They have the most diverse and best-trained engineering teams, and for the most part there is no shortage of emulators, simulators, specialized simulators, FPGA prototyping equipment or formal tools—and the most highly developed internal methodologies, which are essential to bring these chips to market. These kinds of devices typically are developed at the latest process nodes where investment is higher, and better coverage and shorter verification times are considered money well spent. It’s also the place where the stakes are so high that a fatal design flaw can cripple a company’s competitiveness.
Within this space, the ability to easily shift from one verification engine to another is essential, and all of the big EDA companies have invested heavily in making this possible for their largest customers.
“For all of the big guys—the top 25—if they can afford all four core engines, which is emulation, simulation, formal and FPGA, then seamlessness becomes very important,” said Frank Schirrmeister, group director for product marketing of the System Development Suite at Cadence “There is a class of projects that is happy with simulation or a subset of emulation or FPGA with prototyping. The additional effort of using it isn’t in the budget. I would love everyone to have all four engines, but that’s not realistic.”
Inside many midsize and smaller chipmakers—including many developing chips for IoT applications—the picture is remarkably different. Budget limitations, smaller design teams, and the very high cost of some of these tools means that verification engineers generally must use whatever is available to them.
“For large companies that make their own SoCs, it’s difficult enough,” said Krzysztof Szczur, technical support manager in Aldec’s Hardware Product Division. “But other companies and engineers develop IPs and they may use different tools.”
Which tools get used depends as much on what they are comfortable using as what is available. But even where advanced tools are available, their verification knowledge may be limited. Training in how to use verification tools more effectively is improving. Still, it has been an ongoing problem a decade or more. EDA companies have set up a variety of training capabilities, online courses, and donated software to universities around the globe. But the complexity of tools and tool choices continues to rise with the complexity of designs. While some mundane tasks are now almost fully automated, there are enough new permutations, such as advanced packaging, increased connectivity, new features such as embedded vision and an increasing focus on security at all levels to stump even the most experienced verification engineers.
This is challenging enough in large organizations. But even midsize and small chipmakers are now encountering some of these same verification issues, and the resources and training in those organizations is much spottier.
“If you find a bug in emulation or prototyping, do you think the people who are running the test cases are savvy enough to be able to debug?” asked Rajesh Ramanujam, product marketing manager at NetSpeed Systems. “Are you going to get them to understand the FPGA tools? The skill set of the engineers involved at different stages. These are practical things we don’t talk about.”
On top of that, using one verification engine well is a challenge. Being able to do that with multiple engines is rare. “In any organization you will find one or two people that are interested in everything,” said Steve Bailey, director of emerging technologies for Mentor Graphics’ design verification technology group. “They can pick up data and can be good across both. But it’s not common. Everyone is focused on their job.”
Challenges with IP
Third-party IP complicates verification for a couple of reasons. First, much of this IP is a black box, so even if it is well characterized by the IP vendor, visibility into that IP and how it will work in a complex, heterogeneous system is limited. Second, usage models vary greatly for any IP, so a tool that identifies a problem in one use case may completely miss a problem in another.
This is particularly true for most simulators, which run out of gas as more of the chip needs to be verified together. While companies such as Ansys can do full system multi-physics simulation, the bulk of simulators in use at chip companies are much more limited in scope. That makes it far harder to do anything but divide-and-conquer verification of a single block. Still, it’s not uncommon for problems to show up when IP is used in the context of a system or in the context of a use model that was never considered by the IP vendor when it was characterizing that block.
“It’s less about divide and conquer and more about a system view,” said Navraj Nandra, senior director of marketing for Synopsys’ DesignWare analog and MSIP solutions group. “How you configure IP is by partitioning between blocks. So if you have a high-speed memory with a DDR controller, it’s a system-level discussion. But if you’re an IP company, you may not have system expertise in-house.”
Nandra noted that different use models also create new corner cases, and there are no clear rules for how to verify those. “So if you look at Type C, which is a new USB connector, you find there also are alternative modes for audio and video. It gets very complex, and there are a lot of configurations for the end applications. But there is no real standard that our customers are using to solve this.”
There also is no standard way of using standard IP in complex systems. While major IP vendors do a thorough job characterizing the IP they sell, it still may not get used in the way they recommend.
“It’s the interaction of things that becomes challenging,” said , director of models technology at ARM. “You can just debug any single one. We had a customer problem once. When the memory latency was shortened by a cycle the performance went down by half. That involved a setting between the processor, the interconnect and the memory. And it was only one setting, with an exact combination, did that. Unfortunately, it’s not just a single problem.”
Coverage issues
All of this leads back to coverage, which remains one of the thorniest problems in verification. No verification approach provides 100% coverage in even moderately complex chips. Moreover, the problem is getting worse as chips become more complex and companies begin using electronics in safety-critical applications.
“In safety-critical designs, it’s important to have full design coverage, but it’s also important to have flexibility in requirements and results from different states,” said Aldec’s Szczur. “That also needs to be addressed in this environment.”
Achieving that coverage, and a sufficient level of confidence at signoff, always has been an issue. In general, coverage trails complexity by at least one or two steps. Verification tools have become much more efficient, and there are suites of tools available from EDA vendors that are fully integrated. But there are so many possible permutations and interactions, particularly when software is added into the mix, that the problem will never be fully under control. Bugs will always slip through, and engineering change orders will create issues in places that were never considered earlier in the design flow, when they were easier to fix.
“What’s fascinating about the Portable Stimulus is that it’s a new class of verification that people didn’t do before,” said Schirrmeister. “What the Portable Stimulus group in Accellera does is look at test cases and ask people, ‘How did you do this in the past?’ They say, ‘We didn’t, because it would have taken the key architect who understands how all these things fit together to write this test case, and it would have taken him three weeks. We couldn’t afford to do that.’ Portable Stimulus enables a new class of bugs that people didn’t have time to look for in the past. It’s all additive. You still need to verify IPs and subsystems, and now you have a new set of bugs.”
How to solve this isn’t clear, despite years of analyzing these problems and trying to come up with effective solutions.
“The challenge begins with visibility,” said Mentor’s Bailey. “It’s impossible at an SoC or system level for an engineer, or even a group of engineers, to sit down and figure out all the coverage models that are going to be relevant. Maybe for a simple IoT chip that will work, but not for a mobile SoC or server chip or chip of any size. You need to find a way to get better visibility into your design when you’re using a verification/validation suite. And then you need to visualize those results in a way so you can determine if there’s a problem there.”
Bailey said this is especially problematic with black boxes because all of the I/Os may look like they’re working fine. “It’s the next level of coverage information that you’re providing, and you might find your regression suite or validation suite is not exercising what you thought it was. You’re not creating a lot of stress conditions. You’re not seeing things that will give you a high level of confidence that you don’t have any bugs here. You also find out you have things that look like it’s a performance issue, and you need to figure out whether it’s a performance issue or not. Performance applies to power analysis, as well. A lot of power inefficiencies will come from the software architecture running on the hardware, as well.”
This also makes it more difficult to choose the right verification engine. Each has its strengths and weaknesses, but it’s not always clear what the tradeoffs are, particularly as more analog content is added into chips. “The big question we’ve been asking is how you partition IP so that more of the IP is digital,” said Synopsys’ Nandra. “With digital there are a lot of configurability options. You can program current up and down at small geometries. So if you have a high-speed SerDes, you can partition that into an analog block with the equalization done in the digital domain.”
That helps with verification, because as more of the verification is shifted left, fixes can be made to designs more quickly. There are other solutions in the works, as well.
“One of the things we’re working on is to make it easier for the system-level verification guys to figure out exactly what happened,” said Ramanujam. “If you have an IP, you also need tools that are compatible in a way to figure out if something went wrong. If something happened at the boundary, protocol checkers are required. You need something for the user to figure out exactly what happened, or to back-trace from there. Today you run software, you have a deadlock, and you have no idea what happened. The customers of the chip don’t have any idea how to debug all of these things.”
Conclusion
Verification traditionally has been the last frantic push before a chip reaches tapeout. Many companies have contracted verification engineers for the last push in order to remain on schedule. But as complexity increases, even that may not be enough.
Verification continues to improve, and the amount of money being thrown at the problem across the verification ecosystem continues to increase. But even if all of the top verification experts had access to all of the best verification engines, it’s likely they would still miss some bugs. Nothing is perfect when it comes to technology, and nowhere is this more frustratingly apparent than in the verification world.
Related Stories
Rethinking Verification For Cars
Standards for verifying these safety-critical devices, new test methodologies, and there is far more scrutiny about how all of this happens.
System-Level Verification Tackles New Role
Experts at the table, part 3: Automotive reliability and coverage; the real value of portable stimulus.
Gaps In The Verification Flow
Experts at the Table, part 1: The verification task is changing and tools are struggling to keep up with them and the increases in complexity. More verification reuse is required.
Verification Engine Disconnects
Moving seamlessly from one verification engine to another is a good goal, but it’s harder than it looks.
Leave a Reply