Experts At The Table: Coherency

First of three parts: Defining different types of coherency; ecosystem challenges; heterogeneous vs. homogeneous cores; software issues.

popularity

By Ed Sperling
System-Level Design sat down to discuss coherency with Mirit Fromovich, principal solutions engineer at Cadence; Drew Wingard, CTO of Sonics; Mike Gianfagna, vice president of marketing at Atrenta, and Marcello Coppola, technical director at STMicroelectronics. What follow are excerpts of that conversation.

SLD: What’s driving coherency and what sort of issues are you encountering with it these days?
Coppola: Power is the key element behind all of this. Cache coherency is a way to reduce power. But at the same time, it’s requires some improvements in software because that affects coherency.
Gianfagna: Coherency means something different for me. With all the complexity there’s a move to a higher level of abstraction and the need to deal with more information about the chip earlier in the process. If you wait until two weeks before tapeout, it isn’t going to work. You’re out of luck. By dealing with more information earlier in the process, there’s a thread that runs through the process from the very highest level to when you do the tapeout. It has to do with models, accuracy and consistency, and from a broader level we call that coherency. Is the model you use at a higher level, and the decisions you make about the design at that level of abstraction, still consistent and coherent? Are you still pointed in the same direction, or has something changed your information flow about how the design will work?

SLD: You’re looking at it from the standpoint of system coherency?
Gianfagna: It’s even broader than that. It’s model coherency across the full spectrum. It runs from the highest level, where you’re trying to figure out which operating systems and processors to use to whether it’s lithography-friendly. But there are models and information that drive decisions all the way from that model to the end point, and if the models aren’t consistent and coherent at every step of the way the system fails and we start working at a different level of abstraction. That will typically be too low, it will cause problems, and it will slow growth.
Wingard: I would call that design consistency—how we make sure the process is coherent so what we thought we had and what we end up with are the same. That’s a very important challenge. We’re equally interested in the cache coherency problem. It’s important that the data the processors are caching is well understood by the software and that it’s consistent with the image out of main memory that other components may interact with. What’s changed, particularly with the introduction of the ‘h’ protocols from ARM, is that people are openly talking about cache coherency across a broad cluster. That adds a set of interesting and new challenges. Certainly people who have been using design techniques in servers have been dealing with coherency for a long time. Those tend to be much more homogeneous systems. What gets much more interesting is when we have a consumer type of SoC with a much wider variety of nodes that are optimized very carefully for their task. Then you overlay on top of that requirements for hardware coherency management, and it becomes a much more challenging problem. Computer hardware history only takes us so far. Heterogeneous coherency is not something anyone really thought was an important problem. It’s now becoming interesting.
Fromovich: There’s a verification challenge on top of that. When we try to verify this kind of interconnect, we need to handle coherency in the hardware. It’s one of the hardest verification challenges. You cannot solve this in the traditional way. You need a new view of the problem and new tools to handle this kind of challenge.

SLD: The other side of this is that design now involves components of a system that were not internally developed. How does that affect coherency?
Coppola: We can develop a lot internally, but IP from other companies add a lot of value. Verification is really the difficult part because every time we create a new platform we change the architecture. This is a challenge for the ecosystem. Modeling is one of the keys. So to summarize, today we cannot re-use what has been done and we need to invent something new and we can get benefits from heterogeneity.

SLD: But heterogeneity does add some problems for coherency, right? We’re looking at coherency across an ecosystem.
Fromovich: This is what ARM is trying to solve with their new spec for an AXI coherency extension to manage coherency in the system. Hopefully if you follow the spec you can maintain coherency.
Wingard: But there’s a challenge there, because like all AMBA specs it describes a point-to-point protocol with some implied system model. From our perspective the challenge is really in the system. There’s not a lot of work that has been done in truly heterogeneous coherence. We can understand the port behaviors as being correct, but the challenging part of heterogeneous coherency is looking at the system-level view of it. In that way, cache coherence does not lend itself to the divide-and-conquer approach that we’ve been trying to take for the rest of the design-verification challenge for the past 15 years. Cache coherency is only something you can verify when you’ve put the whole system together.
Gianfagna: It’s a global issue, and that’s a big problem.
Wingard: You have to have a system-level view for all the caches, and it has to be compliant with the specification. But there’s more to it than that. The degrees of freedom you have in building your system coherence manager are very high, and the tests needed to verify them are many and varied. No one in the supercomputing space today will implement cache coherence without formal verification as a root requirement. You can’t cover enough state space to verify the whole thing. You need many, many techniques. That’s a big change from what most SoC developers are used to using.

SLD: Most of this has been done for cache coherency, but there are other types of coherency, as well.
Wingard: I/O coherency is much easier. If it’s routing based on attributes I send it to someone else—the coherence manager who does the hard work.
Fromovich: It’s a subset.
Wingard: The heterogeneity here doesn’t bother us. The jury is still out concerning ACE, which claims to be able to handle two different processors with different cache lines. They believe they’ve solved that problem. I don’t know if they’ve proved it to anyone, though.
Fromovich: Nobody ever tried that.
Wingard: And right now people think that’s a risky thing to try with ACE. But at some point, ARM is going to change its cache line. Anyone who’s building a subsystem today and trying to match that 64-line cache will either have to change or use that cache coherency manager. We’re going to see a lot more examples of this because what’s behind all this hardware getting used is software. How quickly do people start to flick the switches so these subsystems get mapped into a cache coherent space? One of the big benefits of going to a more coherent system is that the software guys don’t have to write extra code. But the real question is how quickly does software start to use it. Most people using Linux on a multi-core chip today flip a switch in the Linux kernel so that every page the OS allocates is in shared space. At that point, all Linux becomes cache-coherent. That’s a big image, and the actual amount of sharing is going to be far, far less than that.
Gianfagna: The hardware-software interaction is interesting. The fact that this problem doesn’t lend itself to divide and conquer is going to make it more difficult. It all gets back to the accuracy of the model and how fast it’s going to run. The thinking is that we can solve this by brute force, but it doesn’t work, which is where the hardware-software interaction comes into the design methodology. The problem is that software guys are running code against a fixed architecture as opposed to writing a spec that drives the architecture to respond to that spec. The arrow points the other way.

SLD: What kind of software?
Gianfagna: This problem is pervasive. You can go all the way from the application to middleware to the operating system. It’s probably more interesting at the high levels of abstraction, because that’s where the problem becomes bigger and nastier. At the low levels it’s probably not such a big problem because you localize a lot of that. But when you talk about application code that cuts across the entire system architecture and you have a heterogeneous environment, that’s where it gets to be ugly. You can’t model that with brute force. The verification problem becomes very challenging.
Wingard: The subsystem software doesn’t look like firmware or a driver. It talks to a driver. But who does its memory allocation and in what kind of space. In some platforms, it may be up to the OS. In others it boots before the OS is running.