Experts At The Table: Coherency

Second of three parts: The need for better hardware-software co-design; what’s missing today; transaction-level tradeoffs; ecosystem coherence; hard choices for supporting coherency protocols.

popularity

By Ed Sperling
System-Level Design sat down to discuss coherency with Mirit Fromovich, principal solutions engineer at Cadence; Drew Wingard, CTO of Sonics; Mike Gianfagna, vice president of marketing at Atrenta, and Marcello Coppola, technical director at STMicroelectronics. What follow are excerpts of that conversation.

SLD: How do we bring software more in line with the hardware to improve coherency?
Gianfagna: If we can figure out a way to do co-design better, where the software can be modeled against a version of the hardware with sufficient detail such that you can change the hardware and see an impact on software, you can start to solve this problem. There’s not a lot of that going on today.
Wingard: It’s done today at a level that’s a bit more hardware agnostic. Will that tradeoff help with a better understanding of the memory system? Probably not. They’ve abstracted the memory system.
Gianfagna: Yes, and that makes it invisible. People can abstract things to a level of detail that makes instruction sets accurate. That doesn’t matter. It doesn’t illuminate the problem.
Coppola: It’s clear there is a big impact on the software side. This impact is also how the software is written and how we can share that information with the processor. Today there is not information to share between the GPU and CPU. We need to study this problem and come up with real data. When we have that real data, we will know how to write software better and what impact it will have.
Wingard: What’s surprising to me, but consistent with what you’re saying, is that companies that have been doing SoCs really don’t feel like they have the right models to deal with coherence. The high-level models they’ve been building don’t have the right level of detail to do this. And the low-level models run far too slowly to even think about doing this. And the software interaction means that no matter what model they choose they’ve got the wrong software.

SLD: Isn’t this a silo effect?
Wingard: Yes, that’s one way to think about it. We have these rules about how to do different levels of abstraction. Cache coherence cuts across those assumptions. Some of the things we didn’t think we had to do at this level we now have to do in order to get the benefits.
Fromovich: It’s more about performance at the very early stage. Even in the processor, what kind of configuration do you need to choose and how will that impact your overall system? And do you have the right early system in place to make architectural decisions? We hear our customers asking for help and to provide them with tools at a very early stage.
Coppola: I need to be able to decide if a transaction is correlated. There may be a big tradeoff for final performance. And if you make the transaction cache-able you might not be able to exploit the other benefits of the cache.

SLD: Do we need coherency across the entire ecosystem? Do we need business coherency for technology coherency?
Wingard: In order to interoperate in one of these coherent systems, if I want to cache data I’m going to have to have ACE-compliant cache support. One thing that’s different about coherency protocols is that it’s been relatively easy for third-party vendors to live in a protocol-neutral world. They may support AMBA or OCP-IP. Coherency protocols have so much more semantic content in them and there are so many more implications that the likelihood you will do this for more than one family of protocols is really unlikely. One thing it does for the IP providers is to force them to make a hard choice.
Fromovich: They have to pick a protocol.
Wingard: Yes, and they really have to lock down on it—to the exclusion of others.
Fromovich: In AMBA you can interconnect with up to 100 other protocols.
Wingard: The question is how quickly and for which applications. As an IP provider you care about the applications and how heterogeneous it needs to be and the market opportunities. But scalability and heterogeneity are high on our list of opportunities for adding value.
Gianfagna: If there isn’t consistency in the specs across the industry then you’re limiting growth in the market. A simple example of that is power formats. Do we want to agree that two formats are more efficient than one? I don’t think so. It’s a smaller version of this problem. People who don’t normally like to collaborate need to work together. Otherwise you’ll have to make choices as an IP vendor, and those choices are going to limit your market.
Wingard: ARM plays the role of benevolent dictator in this. Their openness around the AMBA protocol has been a benevolent gesture. The original purpose was to help their customers get to silicon faster to ship ARM cores. At some level the ACE protocols aren’t much different. But there are more things at play here, including the CPU-GPU coherence. They’ve been talking a lot about that this year with respect to Mali. A second vector is BIG.little, which allows you to run an application on the lowest-power processor and still get the job done on time. And then the third vector is to enable licensees to compete for servers against Intel boxes. All those things drive in the direction of maybe a little less openness. It will be interesting to see how the customer base reacts to that. There are more and more constraints on the ecosystem, and we have to follow what ARM does. But there’s an interesting question about how far we go down that path.

SLD: Do subsystems solve the coherency problem or do they exacerbate it?
Wingard: Subsystems drive the need for coherence. Of course, it also depends how you define a subsystem. Is a quad-core Cortex A-15 cluster a subsystem? It has a bunch of processors in it, memory in the form of cache, and I/O to the outside world. It also has a bunch of software, but it’s not self-contained. By definition, its function is to take any kind of software anyone wants to throw at it and execute it. To me, it doesn’t meet my definition of a subsystem because I need to be able to treat it like a black box.

SLD: But doesn’t a black box cause a problem with coherency?
Wingard: If you look at audio and video and communications subsystems, performance means the global memories inside those subsystems may need to be cached. If it’s just an IP core it probably didn’t need to be cached, but with a subsystem it does. A GPU is a great example. Right now it doesn’t need to be cached with the main CPU. There are an inordinate number of cycles in software that are spent copying data from cachable space to non-cachable space. It’s a big drain on CPUs.

SLD: Does it improve with 2.5D and 3D?
Coppola: At this point we may have level-3 cache. Stacking in 3D could have an impact on this architecture, particularly on memory with Wide I/O. But 3D stacks also may complicate our lives because we have more choices.
Gianfagna: I agree. It provides us with more options for implementation. That impacts the architecture of the hardware.