Experts at the table, part 3: Automotive reliability and coverage; the real value of portable stimulus.
Semiconductor Engineering sat down to discuss advances in system-level verification with Larry Melling, product management director for the system verification group of Cadence; Larry Lapides, vice president of sales for Imperas Inc. and Jean-Marie Brunet, director of marketing for the emulation division of Mentor Graphics. Tom Anderson, vice president of marketing for Breker Verification Systems provided additional content after the event. In part one, panelists discussed the differences between block-level and system-level verification and the differences in thinking that is required. In part two, panelists discussed how the requirements for system-level verification can change with design type, model discontinuity and the needs for common stimulus and debug. What follows are excerpts of that conversation.
SE: Is system-level debug a different kind of problem requiring a different skill set?
Lapides: To see things at this level and to handle several different platforms integrated together needs engineers that can grasp the scope of the platform and the tools and the problem. It typically takes a long time for them to develop that kind of system-level viewpoint.
Brunet: It is not trivial and we see the problems today with testbench creation. Not every testbench can be accelerated on an emulator. They think it is push button and that they can take a testbench from simulation to emulation and expect it to run 100X or 1000X faster and it doesn’t always happen that way. They have to follow best practices about how to modify the testbench so that they can achieve that level of acceleration.
Melling: You do find system-level engineers in the large chip companies that are the go-to people. When the silicon comes back, they are the ones in the lab bringing it up. As vendors, we are trying to understand their problem and be able to divide the problem so that more people can have that kind of view and the role they have, how it impacts the system etc. It has been an elitist group that has performed that task in the past.
Brunet: I see an interesting analogy that is similar to what happened in the back-end 20 years ago. Today we have a lot of integrated tools—platform simulation, emulation, prototype with common debug and Portable Stimulus Working Group . We went through this kind of transition twenty years ago with the backend. We had timing guys running static timing analysis, the place guys, the route guys, the extraction guys. Now we have physical synthesis and everything is integrated together. We are going through exactly the same transformation except it is a little more painful. The common debugger is perhaps why it has taken so long.
Lapides: Is there an issue with the size of the state space? Naively I would argue that the back-end problem is constrained in size and software guys have no physical constraints. This will make it harder to solve.
Brunet: It is more difficult and there are more users. With ‘shift left’ we have to encompass software and so now we have 10 times the number of users and that makes it take even longer.
Lapides: It is more than that because there are different layers of software.
SE: Portable Stimulus will enable us to go from use cases to a variety of execution engines. That does not mean that all of the testcases that could be created are equally suitable for each engine.
Melling: Certainly not the same test. You want to size them for the different platforms. Horizontal reuse is all about taking advantage of the available resources – it has to enable the generation of a test that can run at full speed on the target platform. You don’t want the stimulus to be the speed limiter of the test. You want the platform to be the speed limiter. The other key is to take advantage of the cycle you have available on that platform. If I can run a million cycles on an emulator I can get a lot more Coverage than I could get running in simulation where I may only be able to get through one hundred cycles of the test.
Anderson: Let’s face it, ‘portable stimulus’ is a bad name. The stimulus itself is not portable. What’s important, as I mentioned earlier, is a model of verification intent that can be used to generate appropriate test cases for all the different platforms. It’s not just the length of the test cases; it’s transactional testbenches versus embedded C code, how you access I/O, how you check results, how you measure performance, and more.
Brunet: Portable stimulus is an interesting technology race. There is the race at the bottom with the engine. This is a capacity, performance and feature race. Portable stimulus and system-level analysis is a new race and we shall see which vendors provide the best solution. It is changing the game as we go from one generation to the next. And for the user, they have to learn how to create stimulus correctly targeted towards the problem.
Lapides: I don’t see that. Fifteen years ago, stimulus was the easy part. You could generate testbenches and create stimulus and people can do that fairly well.
Brunet: How do you decide if the testbench is effective? That is the race.
Lapides: If that is the race, then it comes down to metrics. Functional coverage at the system level may not be the correct metric. What is the correct metric? Nobody seems to have the answer.
Brunet: Not yet.
Lapides: The process to come up with the metrics at the system level will have to be a collaboration between customers. Who will collect together the right set of customers to help the methodology?
Melling: We need to learn what use-case coverage means. How to do coverage on performance? We have to answer for all of the activities and that is a part of what portable stimulus is trying to tackle. We have to bring metric-driven approaches into the system-level world. Even though it talks about vertical reuse, the point is that moving up the ladder (we are not trying to fix things at the IP level – everything is fine there). It all starts from the same thing – you need a verification plan only now it is at the system-level.
SE: Use-cases appear to have an advantage in that at the block level every line of code, every bug was considered equal, but with use-case we have a chance to prioritize and define levels of coverage. Would this give us a more controlled verification flow?
Brunet: I wish we were that smart.
Melling: There is still a lot of trial and error. The things that the customer cares about in his SoC verification today are what cost him the last time around. Where they got burned, they focus on the next round.
Brunet: They will adopt new methodologies when they get burned but will not do risk reduction. I do not think we will get to the level of smartness that you just described.
Lapides: It would seem to be analogous to different safety levels. You could conceptually see different priorities but we are a long way from that.
Brunet: If they see bugs they will fix them. I do not see them making this kind of priority.
Melling: I see priorities coming from pain points.
SE: How is that reflected in the use cases?
Melling: That is the key – being able to identify areas such as power management or I/O coherence and saying they want to do more testing of those areas. Portable stimulus brings the necessary technologies together and to be able to have the tool build the necessary test. The time between identifying something that you want to do and the time to being able to generate a test for it is much shorter. That is where the benefits will come.
SE: One advantage of having more transistors available is to make hardware more amenable to errors, provide ways to work around problems. If the whole platform becomes software definable, then the hardware is no longer the limiting part of the platform. However, that has power and performance challenges.
Lapides: We are heading in that direction, but there are still unique differentiators on the hardware side for most chips. The software has to work closely with the hardware and there is no commodity chip for ADAS or anything like that. The Tesla approach also has some security issues related to updates and then there is software maintenance and needing to do continuous updates – that is expensive.
Brunet: The way in which Tesla handles problems is very different from GM or Ford. Tesla has less legacy and from the beginning they planned for an embedded software approach and that makes them more dynamically receptive to this type of updates.
Lapides: There is a Silicon Valley mindset, which is that schedule comes first and quality second. But we are talking about a car and it has to work. Some of the companies that we consider next generation maybe ignoring history at their own risk and in the end possibly my risk.
Brunet: But Tesla does have high quality. They are doing software updates on a monthly basis.
Melling: They probably started with a verification mentality in the software. Web companies are about speed to release and getting more eye balls and the quality was a secondary concern. But in automotive and other places where there are huge liabilities…
Anderson: We are certainly seeing some movement in this direction, such as the same many-processor chips being targeted at both servers and software-defined networking. But that’s not the same as counting on software to fix everything. Thinking that you can always fix it in software may lead to a lack of focus on quality. If you’re talking about a self-driving car or an implanted medical device, that’s a scary prospect. Moving functionality from hardware to software should not be an excuse to do less verification.
SE: What are the biggest opportunities and challenges?
Lapides: Walden Rhines looked at the engines in his DVCon keynote and went to virtual prototyping first and said this was the biggest opportunity. Virtual prototyping has been around for quite a while and is still not mainstream. Maybe we have 10% market penetration, but probably not even that. That is a huge opportunity, but we have to be able to do a better job of being able to address some of the system-level verification challenges and be able to solve the last problem the customer had on the SoC. They cannot do that today.
Brunet: A customer said that what I used to do in terms of verification task was 80% simulation and 20% using some kind of hardware. Now it has been reversed. The verification pie has increased and most of the 80% is done with emulation today. The footprint for emulation and FPGA prototype is getting larger.
Anderson: We’ve talked some about simulation being replaced by virtual platforms, emulation, or FPGA prototypes. Clearly there are opportunities to use all these platforms together in a much more streamlined way, leveraging the advantages of each. But the challenge then becomes how to make this process seamless. Of course I’m proposing portable stimulus as the answer. A common model of verification intent that can be used to generate test cases for all the platforms, combined with a common debug environment and common coverage metrics, is required. We see this as the next big thing in system-level verification and are putting a lot of effort into helping Accellera create a standard in this area.
Reliability Adds Risk Over Time
Having devices last longer isn’t necessarily a good thing.
Are Chips Getting More Reliable?
Maybe, but metrics are murky for new designs and new technology, and there are more unknowns than ever.
Thermal Damage To Chips Widens
Heat issues resurface at advanced nodes, raising questions about how well semiconductors will perform over time for a variety of applications.