Simplifying And Speeding Up Verification

Experts at the Table: The impact of AI, chiplets, and more precise interconnects.

popularity

Semiconductor Engineering sat down to discuss what’s ahead for verification with Daniel Schostak, Arm fellow and verification architect; Ty Garibay, vice president of hardware engineering at Mythic; Balachandran Rajendran, CTO at Dell EMC; Saad Godil, director of applied deep learning research at Nvidia; Nasr Ullah, senior director of performance architecture at SiFive. What follows are excerpts of that conversation. Part one of this conversation can be found here.

SE: With RISC-V, which is a new instruction set architecture, what’s changed in terms of the tools and approaches. Are these the same as everyone else is using?

Ullah: In my previous job, we used AI for the simpler stuff and in coverage areas where you could have the most impact. These were kind of like machine learning algorithms. We had folks approach us to do more complex stuff with AI. The problem was that the software was so complex that to get someone to really understand what we were doing was difficult. It was a very difficult process to try to get some of these more complex mechanisms in. That said, at SiFive, we have found that open source brings challenges and opportunities. The SiFive model is we provide technology based on RISC-V, and our customers can take that and make modifications. They can add instructions and different things to come up with their own mechanism. So one problem that comes up is that we need a verification methodology that is a continuum. It needs to go from us to our customers. And that results in us having to work with every single EDA vendor primarily because everybody uses something different. In addition to that, a lot of the open source development is done with open-source tools. For example, Verilator, which is a very fast Verilog simulator, FireSim, which is a very fast emulation platform — these are all open source tools that came out of UC Berkeley and other universities that are used extensively. And making those work with standard commercialized tools will take standards effort. We’re looking into that. We have to use the same the tools that we’ve used before to be much more thorough, but we also have to figure out how we can integrate all these open source methodologies in there. We use Chisel, which helps generate RTL. So there are some differences. It has to be generated and tweaked to do all the things we want to do, and with a very complex chip design that can be problematic. The problem just increases in having interoperability between all the different methodologies. That has become a very big problem in our area.

Garibay: Coming back to AI, classical machine learning techniques from the ’80s and ’90s were developed in a time of compute scarcity, and they’re severely constrained. And the environment that we’re working in isn’t necessarily portrayed. But when you look at the size of the problems, you can’t do deep learning type things. If you look at the types of problems that are amenable to deep learning — teaching a computer mahjong or chess, or teaching natural language processing with unstructured data — those are incredibly simple problems compared to a 20 billion transistor chip if you think about the number of states and the number of interdependent states. And so being able to do deep learning on the number of states in one of our big chips is currently not possible. There’s a lot of play for machine learning techniques, but deep learning will continue to be challenging.

Rajendran: Those are simple problems, but they have high value. The end goal is to figure out whether verification cycles can really find the bug? I have friends who told me they never retire any verification test. Whatever they wrote some years ago never gets retired. Even if it’s no longer being covered, they’re still there. In EDA, every company does the same thing. There’s never a process to figure out, ‘Hey, is this test really finding a bug?’ We need to start retiring test cases. We can throw 40,000 cores and VCS licenses to do it, but it’s not an efficient use of our time. We’re still not there with Dell technology, but that’s in the pipeline. It’s something we want to go target.

Schostak: It depends what kind of test you’re talking about. If you look at the verification environments, an awful lot that has changed. There’s always the classical directed tests that are used to assess some level of compliance to the Arm architecture. Is it part of Arm’s instruction set architecture? We have architectural licensees, and they have to have some confidence that what they’ve implemented and what we’ve implemented in the base architecture will work together. If you’re looking at the classic constrained random, unit-level, or even emulation, you’ll see progress over time there. I’m less worried about not removing tests that aren’t useful. It’s more about restating the problem in a different way. If you’re running constrained random, how do you know when to stop? You can’t build a coverage model that expresses everything about about a design where you didn’t necessarily name everything that is interesting about that design. And it’s that type of saturation problem which I think is difficult. And that is rephrasing when to get rid of a test to, ‘I’ve got all of these different tests to run on, and which one is going to be the subset that finds all of the bugs?’

SE: What happens when we move into 5nm, 3nm designs? A lot of new chips are not being made in the billions of units. Time to market is extremely tight, and many of these chips are highly customized.

Ullah: You’re right, the volumes are a lot less now. A cell phone maker can sell hundreds of millions of units, and it may have 100,000 cores to use. We could run all the old tests and new tests. But in a startup there are a lot less resources and a lot less customers. We need to start going back some of the old methodologies for the classical AI stuff to help us. Or, we go into another mode where we don’t buy all of these cores. We go into a subscription model where, for $1 a day, you can get a core. It will have to be cheaper, and people will need to feel comfortable doing that — not just for cores, but for emulation and for all the other resources. But we have to make some changes because you’re not going to have many companies coming up with this huge set of resources anymore. Whether we put more software in there or have another different mode of using hardware, we need to make this change. And with all these new technologies, it’s only going to get worse

Garibay: The problems you point out are somewhat orthogonal because the only chips that can go to 3nm are ones that are going to ship in tens of millions of units. That’s the only way you’re going to be able to fund the mask set, much less the design. Even if the design cost was zero, the mask set itself is prohibitive. But at 28, 22, and even 16nm, you’re starting to see a really interesting trend toward more bespoke chips. That’s the whole model of enabling more innovation to happen. We’re even starting to see more bespoke analog chips, which I never thought we’d see. Being able to verify each of those things efficiently enough is kind of going back to the really old days — the ’90s — when we had a thriving ASIC business in the world and there were a lot of custom chips. But those weren’t that complicated. Now they’re custom and complicated.

SE: But when you think about 3nm chip, the reality is that it’s going to be a 3nm digital chip packaged with something else, because even if you get down to the entire chip down to 3nm it will not get sufficient power and performance benefits to justify the expense of scaling. The chiplet model is one such approach. But no matter how this is done, you need to verify chips in a package. How does this change things?

Garibay: I see it as a godsend because finally it adds another level of modularity. Now we can say, ‘There’s a chiplet boundary and we’re going to verify that chiplet in isolation. It has super-well-defined interfaces.’ That’s a solvable problem, and once solved you can almost put it on the shelf. There may be some interesting problems you find when you plug them together with other things. But we’ve gotten to the point where we’re doing these massive many-billion transistor chips. Even though we did modules and stuff like that, designers will run a wire from here to there to solve a problem while creating another one. The hard boundaries of chiplets are going to enable a lot of innovation because it makes the problem solvable to some extent.

Godil: It’s very interesting that we have this modularity now. This modularity opens up this new kind of frontier, a new dimension to verification. Even in the best design interfaces, you’ll always have a risk of functional bugs, right? So you always have to kind of verify these things together. But now that you’ve modularized it, you could have some components that are available earlier. You could look at using that in an emulation platform, connecting it directly and actually simulating and verifying your chip before you tape it out. We’re going to see a lot more of that as chips are built in different stages. As those stages become available in silicon, you’re not going to be simulating them anymore or running them on an emulation platform. You’ll be plugging that chip in directly and verifying it with the rest of the modules that are yet to come. That opens up a whole new possibility of tools and support that we haven’t really thought through and haven’t built up yet.

Garibay: It’s very similar to what we’re doing today with system-level simulation with PCIe. All the chips that interface with PCIe are well-defined. You design to that standard, and then you hook them up and we emulate with PCIe VIP interfacing to Chip 1, Chip 2, Chip 3, Chip 4. We don’t do a lot of simulation of all those chips together. We assume that checking them individually against a standard interface is sufficient, and you can just model them together in a software model to get the system.

Schostak: It doesn’t make a lot of difference for Arm’s model because Arm has always been an IP provider. Arm keeps its changes where our internal boundaries are and where our partners interface with us. From the actual verification we apply, it doesn’t change a lot internally other than doing a bit more in the chip flows to say these things will definitely work together. Arm generally guarantees that when outside partners take this IP and stitch it together, it generally will work. We don’t guarantee that stitching. You can introduce bugs there. But going back to PCI Express being a well-defined standard, from some of the experiences we’ve had there are ambiguities in the standard. If you build a perfectly valid interpretation, which might be verified using VIP, and then another company builds a perfectly valid interpretation of PCI, when you try to put those two bits IP together they still may not work together properly. So with defining interfaces or entry points, we will be making sure they are more precise than some of the interfaces that we see now.

Godil: I wanted to expand on that a little bit. This question of whether we can get away with VIPs for the interfaces versus actually simulating with the other chip ties to into PCIe verification done with VIPs. With regard to chiplet APIs, are we going to have time to develop mature interface standards like like PCIe, which took a long time to settle down? Or will innovation drive us to go faster? If we say that we’re only going to use the IPs, we’re restricting ourselves to well-defined mature interfaces, which will just limit innovation. I’d like to see verification flows enable us to go with less-defined APIs and be able to innovate on the APIs and the interfaces that we have, but then make up for that by allowing us to verify them even though we don’t have all the other VIPs. For example, when we are building our GPUs, we are always going with the state-of-the-art PCIe technologies. When we did Gen 3, there was not a chip that had Gen 3 available. If you went to any of the EDA tool providers, they were all planning on building Gen 3 VIP, but none of them had it. So we had to build it all on our own. Even with new versions of PCIe, it’s not like it’s a solved problem. It is really uncharted territory, and it’s a lot of work to go figure out how to make your chip work with a new standard that nobody worked with. With chiplets you’re going to see more innovation on the interfaces, and you’re not going to limit yourself to standardized ones. But again, the tools and the flows have to catch up. If you don’t have a way of verifying them, then nobody’s going to use them. There’s an opportunity for us on the verification side. If we can provide a way to go enable verifying these new interfaces, then we’ll see a lot more innovation.

Ullah: Right now we give them a chip. We also go all the way to tape out. But what I’m seeing is they make changes to the chip, they put different pieces in there. Interoperability needs to be very precise. That could be problematic if you expect all these things work together. Maybe a solution is for us to give something that is golden to them, whether it be RTL or VIP or a virtual model. Our customer then can make sure that everything works properly. We can’t do it ourselves. It has to come from the customer. But we have to make sure that those things are all interoperable and they work with whatever tool set the customer uses. That is becoming to be a bigger problem. We’re having to do a lot more collaboration with all the EDA vendors now than before to make sure everything works together. There has to be a joint partnership between us, the chipmakers and the folks who are going to put out the product to the customer.

Rajendran: I’m a big fan of the emulator platform. The way development used to work is you would take a golden copy or golden version of something and hand it over from one team to another and say, ‘Hey, this is good to go. Use it.’ That kind of cycle is not going to work if you’re planning to deploy and create a product almost every 1.5 to 2 years. You have to do things differently. That ties into the culture of an organization. I’m a huge fan of the emulators, which varies from company to company in how you use it. It’s not just for emulation. We also use it for system acceleration and simulation acceleration. But there’s a lot of software components going into chips. There’s a power benefit in going to the lower nodes, but the software drains a lot more power out of it. You can implement solutions to certain problems in your chip, or you can implement those in software. It’s easier to implement in software, but it’s going to cost you more in terms of power. So coming back to the design, you need a platform that’s able to verify holistically as every piece of that design keeps changing. Emulation is a key component, whether big or small. Right now, only the big companies have emulators. The small companies don’t. But that seems to me to be the wrong approach. It’s okay for them to buy 30,000 or 40,000 cores, but you can run simulations much faster on an emulation platform.



1 comments

Kevin Cameron says:

Emulation completely ignores issues like timing and power, and is only good for functional simulation/verification (which should be correct-by-construction anyway). AI chips (like Mythic’s, or Graphcore) should be able to do circuit simulation a lot better than emulators, and actually verify Silicon is going to work at an analog level, but they don’t do that because most of the AI IC guys don’t actually know how to apply the stuff they make.

Leave a Reply


(Note: This name will be displayed publicly)