IP can no longer be developed and verified outside of the context of the system in which it is expected to operate — unless we develop new verification methodologies.
Experts at the Table: Semiconductor Engineering sat down to discuss the state of functional verification with Mohan Dhene, director for architecture and design at Alphawave Semi; Andy Nightingale, vice president for product management and marketing at Arteris; Dinesha Rao, senior group director for software engineering at Cadence; Chris Mueth, new opportunities business manager at Keysight; Gordon Allan, director of verification IP products at Siemens EDA; and Frank Schirrmeister, executive director for strategic programs and system solutions at Synopsys. What follows are excerpts of that discussion. Part one is here.

(L-R): Dinesha Rao, Mohan Dhene, Gordon Allan, Chris Mueth, Andy Nightingale, Frank Schirrmeister.
SE: System-level verification is often left until the end, but does shift left mean that hardware/software verification is becoming more important? And will that mean more reliance on things like virtual prototypes?
Dhene: General-purpose design is being replaced with custom design with specific workloads. This means that someone who is verifying hardware should also understand the software that runs on that hardware. Not having these requirements known is going to be a problem. If you look at the latest SoCs with AI/ML technology, most of them have the requirement of software being validated during silicon development. This puts pressure on tools to improve, and methodologies and workflows to be reworked and rebuilt. This is putting a lot of stress on teams, and everyone is being stretched a bit thin. In terms of tools and methodologies, it’s not that we are not innovating. There is a lot of innovation that’s happening. If you look at the space of software/hardware co-design, emulation with a focus on hardware debug, and prototyping with a focus on software debug, these have all made significant progress. Similarly, in the formal verification space, in this era of domain-specific architectures, there’s an explosion in processor design. Pretty much all the formal solutions are scaling to address this space. I’m cautiously optimistic. It’s just that the requirements are scaling at such a high pace, and we are in catch-up mode.
Nightingale: Verification is scaling in different dimensions, at different rates, because of the software. You’ve got virtual platforms for early software bring-up, which are absolutely crucial. Even though that’s scaled up, and you can get onto an AWS server, grab a virtual platform, dump an OS on it, get it to boot. You can interact with all the elements in the design and see how the different bits of software interact together, which is great. However, what gets missed is the fact that we haven’t shifted left enough, fast enough, in order to catch the protocol-level errors or the interaction issues between pieces. The verification community needs to shift left a little bit more so the software that’s making a transaction somewhere is not missing the fact that it’s invalidating a protocol. The software runs, everybody’s happy, and they move on. On the hardware side, you just sent a potentially invalid protocol, but it is not seen at that abstraction level, or the arbitration mechanism has been violated in some way and not flagged. Your software runs on the virtual platform. You implement that in the hardware, and you get an arbitration issue: ‘Why has my software performance tanked?’
Mueth: Because we haven’t seen it before, and because the verification methods that were used before were sufficient. Now we’ve hit a tipping point on the complexity where you’re starting to see some cracks. Now we have to scale the verification at the same rate.
Schirrmeister: For IP verification, the list of requirements itself is complicated, but it’s working reasonably well.
Nightingale: That is a bounded problem.
Schirrmeister: Then the next one, because complexity moved up, the design moved up, as well. Now you have an Arm CSS, or any sub-system, and you have three different versions of PCI Express. That’s your sub-system. Now, that sub-system ends up on a chiplet, on an SoC, some multi-something which becomes a multi-die. Then I have the actual multi-die system within its context. The challenges that you are working on happen after 10 minutes of real-time. Emulation won’t help you there. You really want to be smart about how you find that coherency issue. Coherency is one of the really mean problems that happens after quadrillions of cycles. The verification problem we are talking about is really at that integration point. Then you have system issues and thermal issues.
Mueth: Your chip works fine until you integrate it, and that adds another dimension.
Schirrmeister: The 15% from the survey partly comes from these issues at the system-level where verification was not done early enough. It hasn’t been shifted left. But you really need to consider the scope of which type of error you actually find.
Dhene: IP reuse. What we keep telling people is being challenged in the sense that IP is developed based on the requirements of a particular system. But when it is getting re-used, that’s when all the challenges come, especially in different sub-systems. That’s where it is put under a lot of stress. Reuse has to be looked at, because it takes a lot of effort to get that IP working in different systems.
Rao: From the methodology perspective, the traditional method of verifying an SoC starts at the IP level, then sub-system level, and then full-chip level. That’s typically our workflow. But now, with different engines working on different data sets at the same time, like Arm CSS, the way it works is different. With chiplets there are many different types of transactions happening within a system, and that makes it very difficult to simulate at the IP level. You target IP for IP-level test cases. But you need to bind these test cases with the actual SoC spec. That is your real target. But the SoC spec doesn’t cover all the transactions you need because the spec will only tell you what it is supposed to do. But in what scenario, and what critical scenario, is very difficult to predict at that time. If you have multiple power domains, then with some of the dynamic power saving utilities that you created, the design you have created, and all these channels that you have implemented based on some transactions happening somewhere, you miss one of them in your spec. The DV engineer won’t be able to generate that test case. It’s not possible. The spec needs to exactly specify all the scenarios, which is very difficult. That’s why you have emulation, where you can run your system-level test cases. You will be able to do certain things, but the spec to silicon schedule is shrinking. In the past, the schedule for each chip used to be 2 to 2.5 years. That is shrinking. Now we are talking about a 9-month to 15-month tape-out schedule. That’s very aggressive. Even if you have an army of people verifying it, some features will get through the gap, and that’s where we are finding the issues. We need a solution that basically makes your spec as complete as possible. Right now, that is very difficult. That’s why we need a tool that can come up with the spec, and then accordingly create a verification environment for that. That is basically creating a testbench or verification strategy at different levels — IP level, sub-system level, to full chip and multi-chip. Multi-die is coming into the picture, and that complexity is growing exponentially. We need an AI-based approach, and that is at the upfront spec level to verification. That gap is where we see the issues.
SE: That’s scary because the design is also going to be driven from the spec. Now, if you are driving the testbench generation automatically from the spec, that has violated the principle of verification. You need two things that are independently developed and compared against each other.
Schirrmeister: They shouldn’t be the same.
Mueth: Independence, but the spec is supposed to be a source of truth.
Allan: The granularity is changing with chiplet economics. Assuming that continues on the same trajectory, we will have a wholesale shift in the granularity of our architectural decisions. For maybe 80% of the SoCs or SiPs, they will be re-using more trusted elements. The trust will be built in, the security will be built in, the bounded IP verification will be a requirement for that composition. But once we get there, we will have a different set of architectural decisions to make, at a different level, that will take us to the next level. Now you’re right about tools. AI can equip us to deal with that complexity. But in some respects, we’re dealing with it by changing the game, changing the granularity, embracing chiplets as we embraced ICs on a printed circuit board 40 years ago. The LEGO brick has a different form. As EDA, we are investing in that, but that’s a long-term investment. We also need to take care of low-hanging fruit now. We are in this complexity cycle, and we’re taking care of things such as the last 10% of coverage closure. That’s a problem that we’ve solved for customers, pushing the boundaries of today’s problem. That’s a productivity problem that customers are hitting, pushing the boundaries of the today technology. We’re investing in both low-hanging fruit for the today problem, but we are also acknowledging we’re in this upcycle of a cyclical industry. The next 10 years are going to be so exciting with what is to come, and it’s on all of us to invest in that, so that we’re ready for it.
Schirrmeister: We had this discussion 20 years ago, saying verification engineers need to be different from designers. We need to have separation of concerns.
Allan: We created the problem.
Schirrmeister: Many researchers are talking about automatically generating RTL from spec. I share the concern that this is not correct by construction, because the way we create the spec is never so dense and correct that everything the spec says is golden. How do we get out of it? We just created a new race. But both races are AI enabled. We are creating more designs with AI agents. Now we need to have another layer of verification to make sure that what that creates, within the system context, is verifiably correct, and for that, we need a different …
Nightingale: An independent party.
SE: We certainly need a step change if we’re going to keep up with what AI is doing on the design side. And we can’t use exactly the same thing for verification.
Schirrmeister: From the survey, we need to know about the breakdown of the 15%, potentially by technology nodes. Are those 3nm designs? Are they 7nm designs? This concentrates on the most advanced designs. Those are the trailblazers. We need to worry about those. We need to figure out methodologies to verify those and make sure they work. But is there a class of designs, a bifurcation in verification? Is there a set of designs that are not the data center mega designs? What about smaller designs? And what about the sub-systems? We need automation to close the gap — automation for some of those components, like processor verification, and like the sub-system verification. There are automated techniques out there that work. We have invested heavily in processor verification. With the number of processor variants emerging, we invested into automatically verifying those processor variations. But that is a very limited scope. How does that processor work if I integrate 64 of them, connected by a NoC. How do I figure out coherency issues? That becomes the next-level problem. Whenever I go to a customer and talk about automation technologies, the IP guys are very happy. It helps them automatically get more test cases, just like constrained random helped us. The problem with coherency is, if I have automated tools that generate lots of tasks, the bug is so difficult to find that this tool runs for days without finding any bugs. We have different scopes to consider for verification. I’m hopeful on the IP side and the sub-system side, but for the system side, there’s a lot of work to be done.
Dhene: Most of the time, when we look at bugs, the IPs are very well tested, and then they come into the system. If you start running a similar set of SoC tests on those, which are all well-defined, you will not find entry-level bugs in any of these IPs. You need to look at tools like PSS, where you can automate test generation and create thousands of tests to get a jump start on the SoC. That’s where you start looking at issues that are boundary issues, or that are hiding within the cracks in your design. These are new technologies that I feel the industry should start looking into — more and more adoption of technologies like portable stimulus, which really accelerates verification.
Leave a Reply