Coherently Incoherent: Dealing With Complexity

For cache coherency to be widely adopted customers need to see a clear benefit of adding more complexity to an SoC.

popularity

By Frank Ferro
I was a bit frustrated this weekend after installing a digital light timer—yes a light timer. As an engineer this should be no big deal, and for the most part, I installed it without shocking myself or other major problems. This timer had all the bells and whistles. It knows about time zones, adjusts daily for dawn and dusk. It even adjusts for daylight savings time. The problem came when I tried to program this device. It took me two days to get it right (I actually had to read the instructions)! How did a very simple function like a switch become so complicated?

I had a similar thought last week as discussions have intensified around the need for embedded consumer SoCs to support hardware cache coherency. How did connecting one core to another (a switch and router) become so complicated? Much of the discussion has been sparked with the recent introduction of the new ARM CCN-504 cache coherent interconnect. Although this IP is for high-end computing platforms, it is clear that these types of coherent networks will be needed for lower-performance applications also. It’s not that cache coherency is new in embedded SoCs. It has been used in computing clusters for some time now. Keeping memory coherency within the computing cluster, however, has been the problem of the CPU vendor only because it did not affect other memory transactions in the system. What is new is that other processors in the system (GPUs and DSPs) also will need coherent access to memory.

Why? I don’t need to reiterate the increased computing demands and bandwidth challenges in today’s mobile SoCs. The advanced features in today’s on-chip networks like QoS and virtual channels have been introduced to maximize system bandwidth and concurrency, thus reducing any negative performance impact due to multiple processors competing for system memory. Even so, anything that can be done to minimize the need to access off-chip system memory, with its long latencies, has real performance advantages. Given this, other processors like the GPU and DSP can have their own local caches to maximize performance, but these local caches may need to have a consistent view of memory with the other heterogeneous processors in the system. Reducing the number of external memory accesses also has power savings benefits, which is critical for mobile SoCs.

Another layer of complexity. Supporting cache coherency means that the on-chip network has the added task of determining if a data transaction is coherent or non-coherent. If the transaction is coherent, it will have to be directed to a coherency network to manage the progress of these shared transactions; if not, then the data can pass directly to memory. Supporting coherency brings many new and challenging architectural decisions to the SoC design team including: how many computing clusters will be supported, will other cores be fully coherent or I/O coherent, how to mix coherent and non-coherent masters, what type of coherency scheme (snooping, directory), how well will the system scale? These are all critical questions with answers that vary widely today depending on the customer, the application and the processor used.

Given ARM’s large share of the mobile processor market, the ACE specification is closest thing we have to a standard. Implementations today include a mix of coherent and non-coherent networks connected via ACE and ACE-lite (for I/O coherent data) ports. In addition, OCP 3.0 has provided a protocol specification for coherent data transfers, and some companies have developed their own proprietary coherency networks. As the networks evolve I would expect to see better integration of coherent and non-coherent networks. I also would expect to see networks that offer more scalability (moving easily from one or two computing clusters to systems that support many more).

For cache coherency to be adopted on a wider scale, customers need to see the clear benefit of adding this level of complexity to the SoC. Having the ability to simulate with test cases showing performance and power benefits will be very important. In addition, SoC designers are still not certain of the system requirements (listed above) and the timing to introduce hardware coherency. One thing is certain however, that supporting hardware cache coherency in embedded SoCs offers both potential benefits and challenges to designers that are not for the faint of heart. This is definitely not a simple switch.

—Frank Ferro is director of marketing at Sonics.