Experts At The Table: ESL Reality Check

Second of three parts: Stacked die and power; design exploration; standard models and interfaces; the cost of developing models; creating an interoperable ecosystem.


By Ed Sperling
System-Level Design sat down to discuss electronic-system-level design with Stephen Bailey, director of emerging technologies for the design verification technology group at Mentor Graphics; Michael McNamara, vice president and general manager of Cadence’s System-Level Division; Ghislain Kaiser, CEO of DOCEA Power, and Shawn McCloud, vice president of marketing at Calypto. What follows are excerpts of that conversation.

SLD: What changes in 3D stacking?
McNamara: The design constraints are still there, but they’re different. There’s a particular ratio that is now fixed, as opposed to something you can design. You’ll get something with a memory where the ‘and’ instruction on the processor completes in x amount of time and fetching from memory is 15x or 20x or whatever. That’s fixed, so it’s not a parameter of your design.
Bailey: And more important, the power consumption is much lower. It does relax a lot of those constraints, so people can look at architectures in ways they couldn’t look at them before. You have multiple paths to memory, for example, and that will bring a lot of pressure into system-level design even before you get to implementation because you have to analyze all those factors together and then implement the actual architecture.
Kaiser: At each technology node complexity is increasing. At 45nm, we are able to design multicore chips and to integrate a lot of blocks together. The software complexity is increasing, and all of this makes power consumption much more complex. The system-level challenge for this complexity is twofold. One part is verification. The second is system exploration, which is a big opportunity for system-level design. For that exploration, the 3D challenge is another leap in complexity. But it also offers many degrees of freedom, because you can integrate a block in a certain stack in a certain technology node, or move that stack to another level and another technology node. At the system level, the opportunity is to explore and then have a virtual prototype for the hardware and for the software to allow for verification.

SLD: Do all the models become standards that everyone has to work with? To write an IP block does it have to fit into one standard model or another?
McNamara: That’s definitely happening. If you look at the Android stack, for example, if you can build a device that can run that software then you can go to market. If it doesn’t run that software, you can’t go to market. We’ve had that before with Windows 95. If you could build a laptop that could run Windows 95, you could go to market. But it wasn’t like Android, which is crushing companies. Whole OSes like Symbian are going away. There is a huge opportunity where you have IP that’s doing graphics or GPS, and you would like it to be included in the next phone from somebody. How can you win that socket? It has to run the software. You can’t run software on RTL, but you can on a transaction-level model that’s abstract enough. In any space, you introduce system-level design if you’re really organized—or when you have to. If you could build a building without caring about what’s above you or below you, then you don’t need system-level design. Where you need it is where the design at one level is massively affected by what happens above, or vice versa. You may need to represent power at the system level because software is going to have such a huge effect on power. And power is really being consumed by transistors. It actually proves the point of the interfaces. You have to have a standard modeling language that allows you to communicate across these different levels of abstraction. If you have a custom, unique model you can’t trade off A vs. B.
Bailey: But if the interfaces are standard, as with the Android stack, or with AXI and the ARM processor core—which is a de facto standard—that’s what allows all this re-use of complex functionality. An IP block or subsystem can be re-used because it connects up to a standard bus interface that people can design to and integrate with. It allows you to create models at different levels of abstraction that interface to that bus standard and provide what details are necessary.
McCloud: The biggest cost of moving to any new level of abstraction is the model. The development of the new models takes a tremendous amount of money and effort. And because of that you want to make sure they have some longevity. There has been a lot of progress in that space. The TLM 2.0 standard has tried to isolate function, power and performance—parts of the TLM model—and that’s really key. That’s why the TLM model for virtual prototyping is doing very well. One of the challenges today, when you go to HLS, is that it’s typically a different model. But TLM platforms need to be able to execute at 200MHz to 300MHz. Getting to a high-speed model, which is necessary for your software development but which also has enough information to be synthesizable, are two different things. That’s driving some of the movement around TLM synthesis standardization, where you attempt to standardize the interfaces that are being used and maintain the core, so you don’t have to rewrite that.
McNamara: That’s a key point, along with power. We need to work power into this.
Kaiser: When you move to a higher level of abstraction the price you have to pay is the modeling cost. But if you have standards to make it possible to build a library and exchange between systems, then the business moves from a service business to an EDA tools business. The power is really a missing part in all of this. We have emerging standards such as SystemC TLM for the functional part, but that’s only a partial answer to the problem. There are several timing approaches for TLM models, and those are different from the software, and it’s different from HLS applications. There is not really an interoperable ecosystem. Power is the missing part, and it’s the next step. If we want to explore a system and optimize the power we need to be able to exchange models and add interoperability between these models.

SLD: While we need to be able to build incredibly complex devices, we also need to be able to decouple the parts for re-use. Isn’t that heading in different directions?
McNamara: That’s where the standards come in. You can’t design a system and uncouple and then recouple them unless there’s a standard way of hooking them up. When you see software updates soon after a device is released, that means there’s a missed opportunity. Even vertically integrated companies get this wrong. This is all about parasitic abstraction. But to get the power data you have to pull it out of the RTL, and that’s not very good for that. So you really have to go down to the gate level and you can start doing math and extract that information, take it up to the RTL and the TLM level and then to the system level. There needs to be a standard for this. There are various efforts under way to tell you here’s what it does, here’s how much power it’s going to use, how big or small it can be, what’s in software.
Bailey: That’s a very difficult problem. To characterize you have to go all the way down to the gate level. I’ve heard from customers that they can do RTL power analysis, but what comes out of synthesis can change RTL power analysis and pretty much invalidate it.
McCloud: But the RTL tool does some great optimization. You definitely want to do that.
Bailey: Yes, but then you bring it up in a 200MHz or 300MHz virtual prototype and try to do power analysis. That means you have all this detailed information you have to abstract. How much power did it consume? Otherwise you’re doing a gate-level simulation while you’re trying to run software and it’s just not going to work.
McCloud: It’s possible to pull that up into the TLM level. Even at the RTL level, tools are doing a reasonable job of doing power estimation within 10% to 15% of the gate level. The key to that is to predict a lot of what RTL synthesis will do. Going up to the TLM level, you can get 30% accuracy by using statistical methods to model the power on a per-transaction basis.
McNamara: You can get to the point where you can probably assume this design will use more power than that design.
McCloud: When you’re making decisions between software running MPEG-4 decompression on a processor or a dedicated decompression pipeline in hardware, these can be 20x to 100x differences in power.
McNamara: If you look at where the power is being used in a device with a plug, we can do more optimal things, but about half the power there is just leakage. With the early implementations of the 20nm nodes, leakage was just killing us. On top of that, about 90% of the active power is just driving a screen. The other 10% is the electronics, and about half of that is fetching things from memory. Software that remembers what it just read can save a lot of power. Then you look at what’s left—about 2.5% of the power actually going through the circuitry—and about half of that is in the processor and the other half is in the logic. So all told, we’re talking about 1.25% of the power that we’re affecting by choosing the size of some transistor for a device that’s plugged into a wall. With a cell phone it’s different.
McCloud: And that’s why the original application of ESL was in the consumer market. That’s where the greatest time-to-market pressure is. It has to be small area and low power.