Experts At The Table: Performance Analysis

Second of three parts: Isolating performance issues in complex SoCs; rethinking methodology and the design flow; the impact of more IP; new ways to divide and conquer.

popularity

By Ed Sperling
Low-Power/High-Performance Engineering sat down with Ravi Kalyanaraman, senior verification manager for the digital entertainment business unit at Marvell; William Orme, strategic marketing manager for ARM’s System IP and Processor Division; Steve Brown, product marketing and business development director for the systems and software group at Cadence; Johannes Stahl director of product marketing for system-level solutions at Synopsys; and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPHP: How do we deal with all the complexity in SoC designs and still improve performance?
Brown: You want to be able to bring together software and hardware as early as possible before you commit to silicon and look at the signatures, the performance, the power, and shorten the cycle back to the architectural model and decision-making so they have a chance to deliver a complete system.
Orme: The amount of verification of verification that you need to do is so large that the turnaround time in cycles can be huge. You need to deal with this at different stages of the design flow and different levels of abstraction so you can analyze the problem that particular segment of the design flow had in a reasonably fast and efficient way. At each step through the flow you simplify it in a different way, so what you are modeling addresses the problem the person is trying to address. You have to start at the architectural level and move toward the system and then the implementation. That way you simplify the system in a different way and you don’t have to go back to the beginning again.
Kalyanaraman: To achieve performance you really need to pay attention to detail. When is performance given attention in the whole cycle and who is the one defining performance goals for the chip? It’s probably the architect. But are we getting a clear direction? Is there a performance test plan? Typically that’s done at the end when you find all the low-level bugs and the system is stable enough to address performance. But that’s so late, and it’s usually addressed when you’re looking for one deep bug. This problem has to be pushed way up front so the architect can look at this. If we can achieve that change, we will be much better off. We almost feel as if the only value that verification is adding is a means of recreating a scenario that will be found in silicon later.

LPHP: This has been talked about for several years, where you move some of the verification further up in the process. Why is this just showing up now?
Kalyanaraman: ASIC teams are very focused on this. But if you talk to SoC teams, it’s very different. You’re getting 15 to 20 IPs from different companies and putting all of this together. We know how these pieces are put together functionally, but nobody analyzes or extracts all the information.

LPHP: Can you integrate all these pieces effectively by raising the level of abstraction?
Orme: It is possible to abstract away all the IP you’re using for the infrastructure. There are SPICE models and verification IP, and you can concentrate on the memory subsystem and the Internet subsystem. But how are you going to maintain service and performance under lots of user cases and scenarios—maybe scenarios that aren’t dreamt up yet? If you design it well, you should be able to design a chip to be tolerant for using different software and use scenarios.
Johannes: If you bring in different pieces of IP you need to understand what the IP is doing. We’ve seen different ways to do that in an abstract model. You can sample what IP does in a previous design so you know what the traffic will look like. Then you look at the block and create a high-level way to drive traffic. And the most sophisticated way is to have a model of the application and execute this application for performance, not just functionality. Those are the three levels customers use to mimic what the IP is doing. Then you have the right modeling levels that will tell you enough about the core architecture.
Neifert: You’re talking about this from a traffic-generation standpoint around an accurate interconnect in a subsystem?
Stahl: Yes.
Neifert: We’re all in agreement that needs to be modeled accurately. When it comes to the other external sources, you can certainly get good representative traffic from the traffic generators. We’ve all got various solutions for doing that at the periphery. One thing I’ve seen, though, is that this doesn’t always catch some of the cases you need. We’ve helped customers avoid costly mistakes when they start tying in the real models and seeing the performance didn’t quite match what they thought it would because the arbitration didn’t work that way. That’s not how they modeled it. Having the ability to get accurate models for the other components up and running as quickly as possible is the key to doing all of this. The reason people want all the other abstracted models around the outside is that it’s the easiest way. You want to give people example systems with processors and IP so they can start trying stuff out. Ideally, you make all your decisions with 100% accuracy. You can’t always do that, but as much as you can tie that back to accuracy and the earlier you can make these decisions, the less time you have to wait before solving these problems in verification—and we all agree that’s the wrong time to solve them.
Brown: The customers are not clear on the different levels of abstractions and accuracy—or what combination—they need to commit to. The virtual prototype is thrown around. There are multiple different ways to apply that. Then there are virtual models with RTL cycle-accurate simulation or emulation. What are the experiment points that need to be committed to in a project plan? There is a tradeoff between schedule, cost and accuracy of decision changing with the architecture.
Stahl: I fundamentally disagree that the best way to approach this is to put together the most complete model and then run software on it. There are two reasons. First, if you are putting together the most accurate model, you might as well do that at RTL. Second, putting the software on this very accurate representation is a humongous task. The software needs to be verified, and the architect doesn’t want to wait for having verified software to debug. Later on, as the software and RTL becomes available, you run the final validation of the software. But early on, the traditional approach of running software on an accurate is too difficult. Maybe you can do it for a small project, but for the most complex systems it’s too hard.
Neifert: The architect certainly doesn’t want to spend the time writing the software, and many times he can’t write the software. But he’d love to see the results of running GL bench. We’ve had to evolve from the model perspective to the system perspective. It isn’t about the latest A57 model. It’s the A57 system with the memory, the software, the Linux and the imported benchmarks. The last thing you want to do is tell the architect they have to configure the system, write the software and spend several months doing this before you can see the value. You want to get representative traffic with enough accuracy to make decisions.

LPHP: The discussion seems to be focused on complexity affecting performance. Is that correct?
Stahl: It affects performance, but so does the method used to get performance results.

LPHP: But it’s also a matter of having to deal with so much first that getting to performance optimization isn’t always easy. How do we solve that?
Orme: Complexity is bound to increase. You’re looking at ways to manage it. We will always keep pace with it. But it fundamentally comes down to abstraction and subdividing problems. Within ARM, we are our own customer. The architect, the designer, the people doing the performance analysis are using every tool they can get hold of to get things done fast enough. We have guys who are writing bits of code that will run on silicon emulation, RTL simulation. If it runs slower, we’ll use smaller bits. There will be traffic generated and models. They will use what they can get hold if in a short enough time. The answer to the problem is whether you can get it running fast enough to get some answers to the questions you’re asking.
Kalyanaraman: It’s really hard to say, ‘This is the way we get to the solution.’ Everyone is focusing on a lot of small things. If there is a problem with interconnects, you also have to look at the system and how everything is put together. If you just take IP and try to exhaustively permute every transaction out of the IP, it’s an impossible problem to solve. So you solve the interconnect problem and then solve each piece by going with a subset of the model that you know you can sign off. One aspect of solving this is the focus. Trying to get the focus from being reactive to see if we can do things up front—that’s really the way to approach it.
Neifert: If you look at the average SoC 10 years ago, things that used to be single blocks are now entire subsystems. Instead of trying to do everything together, you break it down. There are entire teams developing subsystems. This is how they’re doing it inside the entire SoC team, as well.