Configurability causes an explosion in verification complexity, but the upside is verification engineers are gaining in stature.
Experts At The Table: The pressure on verification engineers to ensure a device will function correctly has increased exponentially as chips become more complex and heterogeneous. Semiconductor Engineering sat down with a panel of experts, including Josh Rensch, director of application engineering at Arteris; Matt Graham, senior group director for verification software product management at Cadence; Vijay Chobisa, senior director for product management for the Veloce hardware-assisted verification platform at Siemens EDA; and Frank Schirrmeister, executive director for strategic programs for systems solutions at Synopsys. What follows are excerpts of that discussion.
L-R: Arteris’ Rensch; Cadence’s Graham; Siemens’ Chobisa; Synopsys’ Schirrmeister.
SE: How does chip customization impact verification, especially when it involves multiple chiplets?
Rensch: It’s not pleasant. You infinitely grow the state space whenever you sit there and say, ‘Hey, I want to do this, but I want to change a little bit here, and we want it configurable.’ The state space grows because every marketing guy, when they find out, ‘Oh, you can do X,’ asks, ‘If you do 10% over to the left, can you do Y, Z and A at the same time?’ And they want to change those things. From a verification standpoint, which is my background, it’s awful. It just grows the state space to make the verification much more involved, and it already takes three times as long as a design by making the state space infinite.
Graham: IP providers keep providing all of this customizable IP that grows that space infinitely. Every device revision that has been done forever has been slightly different than the previous one, driven by marketing or some customer requirement, or both. But in the last little while, the thing that’s really been challenging that customers have been challenging me with, at least, is this idea of infinitely configurability, like, ‘We’re going to do nine variants of this thing we’re working on. We’re going to try and do them all in parallel, and they’re all like some parameter difference between them.’ This one’s got 16 widgets, and this one’s got 8 widgets, and something else has got 12, and it’s really the need to do all those in parallel that’s presented the challenge. From a tooling, flow and methodology standpoint, that means that we’re constantly challenged with, how to figure out doing, from the design side, just enough extra. And can I figure out from the verification side, just enough extra so that we can do all those things in parallel with, of course, not nine times the resources? To me, customization has moved in that way where everything has to be parallelized and not quite infinitely configurable or changeable, but not far away.
Schirrmeister: Chip customization is a nightmare for verification, but it’s also a necessity. What a lot of people now do is channel the inner Superman and say, ‘With great customization comes great responsibility to verify.’ That’s where this nightmare comes from. Taking a step back and thinking about what that really means, there are different elements here. In the wider sense, you can do things with FPGAs, which by definition are customizable chips. I would consider that a different nightmare, because there are two verification steps. First, does the FPGA do FPGA stuff? That’s verification one. That’s what the FPGA vendors do. Then, the process of verifying the FPGA is much more hands-on. Sometimes it’s much less structured. We tended to hope for the last 30 years of my career that with the complexity of FPGAs, people would be more structured in their verification. And guess what they do? They build it, they burn it, they switch it on and see whether it works. I’m exaggerating, of course. That’s at the far end of customization, which is at runtime and everything. Then you have this emerging area of what I would call software programmable customization of the chip at runtime, and that’s a whole new level of verification. Then we come to software and hybrid emulation in verification. That’s an important one because if you have a system that is heterogeneous, and you have a compute system with, say, four 16-core clusters, then you suddenly have a number of cores and you start doing software/software co-design. So how do you move things around? And then there is heterogeneity in there. Another aspect to mention are the in-between levels for chip customization, which is this great trend we’re doing in the customizable processor world. It’s not really new, but now it’s institutionalized with RISC-V International and all the good stuff going on there. But it has been there for a long time. The terms ‘software/software co-design’ and ‘sea of processors’ were coined by the Tensilica folks and the ARC folks to configure those processes. So now you do the chip configurability at the ISA level, and that gives you a lot more control of more detailed, lower-level items.
Chobisa: I am coming at this from a different angle. Yes, companies that are designing IP or chips intend to create variants of the chips for a range of target applications. They want to make sure the silicon designs are fine-tuned for the target workload or applications, and they want to do it effectively. What we are seeing in the industry is not just designing a chip and creating variants. Yes, companies are doing that, and it creates a lot of burden on the verification guys, that is well understood. What we also are seeing is the way chips are architected and designed is different than 10 years ago. Everything starts and ends with the software. It is the software workload and application that defines what hardware architecture is needed. It is the software team that is feeding the information to the hardware team so they can figure out the right architecture for the targeted workloads. There are two aspects to consider. One is the design of the chip and what we are saying is, ‘You design one chip, and you create variants of that chip for a particular application.’ But more than that, we are seeing permanency and long-term strategies at customers like Meta, Google, Amazon, and Microsoft, where these system houses are doing custom designs. And those custom designs are targeted for a particular application, and the architecture of that design is being defined by the software workload needs. What are the characteristics of the full workload that will be run? What kind of software will be executed? Is what we are seeing in the design, right? How does this workload affect how the chips are architected? What kind of customizations are happening based on the target workload? Is it going to the data center? Is it an edge application? Is it a custom accelerator? What are the critical KPIs? Power, performance, safety, security, and cost?
SE: For years we’ve heard the term ‘design for verification.’ In the design flow, are they really thinking about the verification path and the potential problems they’re creating when they do this customization? Or is it, ‘It’s not my problem, it’s their problem?’
Chobisa: There are still companies where they design one chip and create variants of that chip to fit different needs. But the more important and prominent trend we see is the custom chips or purpose-built accelerators that execute a particular software or workload application. That’s what Amazon, Microsoft, Google, Meta are doing. System houses are designing their own silicon because the general-purpose chips, and variants of those chips, do not do exactly what they need with key KPIs. Off-the-shelf components may not be a fit to deliver a competitive platform. They are not able to get the cost right. They are not able to get power and performance right. That’s where we see customization happening. Customization is happening at the concept level, and it is driving architectural decisions.
Rensch: I was a verification engineer when executives would tell us, ‘You’re just risk mitigation. You don’t really matter.’ I’m old enough to remember those days. But I have a customer right now for whom verification rules the roost, and it’s a very interesting dynamic when I go in and talk to them and work with them. The verification people tell the designers what they can do. Verification is driving design. I once worked with an engineer who designed a 300-state state machine, and anybody who knows anything about verification knows that that is very, very bad. It’s not just bad. It’s not even very bad. It’s very, very bad, because of all the transitions, and I had him re-do it. He wanted to know why, and I told him it was because I couldn’t validate it. With DO-254, and being on planes, you have to be able to validate things to a certain level of comfortability, and you can’t do that with a 300-state state machine. As I said, I have a very big customer who brings in their verification teams to the architecture meetings to say, ‘Nope. I understand what you’re trying to do here.’ So the verification teams are getting more and more seats at the table.
Schirrmeister: I wouldn’t go as far as saying they’re running the show, but I’m totally with you. They’re in the room where it happens, because early on you need to decide whether the things you want to implement are verifiable and are really necessary. In the end, it’s an ROI question. How many new things do you design in versus being able to verify and later validate. I’m looking at the software space in that context, as well. Also, in the days of Verification 2.0 and so forth, there were requirements given to the EDA vendors. One of the new things happening is that the verification flow is growing up to mimic the design flow. I don’t know whether I would call that design for verification, but now there is much more modularity in verification. There always has been. You had unit verification, you had subsystem verification, you had SoC verification. So now, in this very complex world of chiplets, where everything’s disaggregating, the verification flow is following that trend, as well. In the hardware world, for the longest time, we were doing things like, ‘Oh, who can do the biggest monolithic verification blob and design blob, map it, and run it?’ Because of the complexity of designs, the verification flows our customers are doing are now mimicking the disaggregated design flow and becoming much more modular, as well, especially when it comes to things like the hardware-assisted stuff. Is that design for verification, like it’s design for test where you do put specific test doodads in there? Probably not. But it’s the same DNA of designing with verification in mind.
Graham: I absolutely agree. We’re probably not designing for verification, but there’s certainly a lot of verification folks saying something to the effect of, ‘We’re going to need to be able to do whatever it is, so you’re going to have to design/modularize or do whatever it is in the design so that we can do what we need to do, whether it’s segregate for hardware/software, or segregate for the usage of simulation versus emulation or hybrid platform. The verification flow, along with the design flow, continue to evolve into this very modular thing. There’s a move toward verification for purpose, as well, and that’s growing more and more broad, shifting left and shifting right. When I first started doing verification it was very focused on getting my block or my thing to be bug-free, or hit the coverage that I need to hit, or stop getting test failures. Now there’s a much more zoomed out view of, ‘We’re going to need this to be portable from simulation to the hardware, to post-silicon.’ Things like portable stimulus, obviously, is born out of that necessity. Necessity is really the mother of invention for a lot of this stuff. So that is demonstrating a need in one place. The prevalence of hybrid platforms illustrates that, as well, where it’s very much growing wider and wider, such that when we go have conversations it’s not exclusively about how we can make the simulator a little bit faster. Or, how we can make the hardware a little bit bigger, a little bit faster, a little bit better, a little bit cheaper? Those are still considerations, but more and more we’re having these conversations about how we can make the end-to-end thing go better.
Schirrmeister: I remember in my last design in the 1990s, a friend called me in to say, ‘Hey, we need to deliver this IP. You’re the only one who really knows VHDL. I need you in at 2 a.m. This needs to be delivered by 6 a.m. Help me with this test bench.’ This was in the day of all-manual test benches. At one point we got constrained random, and this is where the Verisity folks started. You would automate that. This was IP modules verification in the ’90s, early 2000s. Fast forward 30 years, now you have disaggregated systems of chiplets or SoCs with a lot of blocks. It’s not possible for one person to understand and even write the test plan for all of it. So for test automation/system test generation, whatever you want to call it, Matt already brought up portable stimulus being critical. We’ve had discussions about what do you actually verify, and what you don’t know that you have not verified. If somebody now uses the wrong knobs in silicon, what happens to that unverified space? We’re trying to fill that again, like we did with constrained random, with much more test automation at the subsystem SoC level. And in that sense, all of that is in the realm of designing for verification. Keeping verification in mind while you’re designing all of this is getting much more intertwined.
Chobisa: Verification is not just about doing things right. Chip functionality is not enough. Today’s challenges have stressed the system-level verification teams to the point where now it’s not good enough. This is because chip designs need to run software workloads and also deliver on schedule the required functionality within the given power, performance, safety specifications, and cost. There are many other aspects like power, performance, safety, security, testability — the list goes on. These aspects are very important parameters. For example, maybe your chip is functional, but it’s not delivering the performance you need. Or your chip is delivering the functionality and performance, but the power is 2X? You can’t use that chip. There are many other aspects that are very important, and these aspects must be verified at the system level, where you are running in the target application environment with a target workload to evaluate how the chip is behaving. This is why verification has become so complex. You need to put the entire system together. You need to make sure the environment where you run the software application is clean. You can run those workloads and debug them effectively. We are debugging at the system level where you have hardware, software, and interactions between the two. It’s hardware-software co-verification and co-validation at the system level. That’s why verification has become very complex.
Related Reading
Improving Verification Performance
Verification tools are getting faster and capacity is increasing, but they still can’t keep up with the problem space. Verification is crossing more silos, requiring expanded skill sets.
Leave a Reply