Got System Cache?

Last level cache to the rescue.

popularity

Similar to the world we live in, a coherent SoC system has truly become a hodgepodge of often conflicting desires, wants, and needs. While some traffic flows are highly sensitive to CAS latency, others have rigid coherent bandwidth requirements, and others are more concerned with “must have” real-time needs to fulfill their tasks. Varying vastly from “must haves” to “best-effort,” finding the right balance and a common language to describe these wants not only makes the communication easier but also aids in developing solutions that are optimized and flexible, just like in today’s world.

Necessity is the mother of invention. In the interconnect of complex heterogeneous SoCs, the Last Level Cache (LLC) fulfills several necessary roles. Its major role is to make sure all the various traffic needs are catered to in a seamless manner. This reduces the unpleasant “choppy” or uneven performance that can occur when a system attempts to cope with rapidly shifting and opposing requirements with impulsive and drastic reactions.

I love analogies! I live in Austin TX, known to most as “The Live Music Capital of the World,” but the other flourishing industry that has taken off here is the incredible food scene. One thing that this city makes you indulge in is food and what goes into the creation of it. For example, I have been here long enough to understand the making of a delectable bisque.

shutterstock_234308422

If you are familiar with the process of making a good bisque (or at the least tasted one), you would know that there are a multitude of ingredients that go into one of these. We don’t comprehend the complexity enough to appreciate the role that each ingredient plays to make it flavorful. Still, without the use of a strainer at the end, you wouldn’t get that luscious velvety finish that transforms it from plain flavorful to purely divine. The job of the strainer is to seamlessly convert all the coarse, grainy, lumpy mixture into a silky and smooth texture. Voilà! Similarly, in a complex SoC, it’s the LLC that smooths out all the conflicting temperaments of the various traffic.

netspeed2
Pegasus Last Level Cache from NetSpeed Systems is a highly customizable and configurable last level cache that eliminates memory bottlenecks and boosts overall system performance.

Let us discuss some examples of situations where an LLC would be beneficial.

Memory, the critical resource: Memory is “the holy grail” of system resources. It has a profound impact on system performance in terms of the amount of memory used and shared by various applications and also in terms of its bandwidth and latency requirements. Memory performance lags far behind processor performance, largely due to advancement in parallel computing such as multicore processors and GPUs. Current techniques focus on solving one factor of this resource problem at the expense of others:

• Creating a shorter direct path for lower latency comes at the expense of slower clocks and hence low bandwidth.
• Other techniques aim at aggressive pipelining for higher fmax while dramatically affecting the latency.

But none of these solutions even attempt to solve the varied functional coherency requirement that is an integral part of a heterogeneous SoC.

An LLC can be customized to boost system performance by eliminating system bottlenecks and lowering critical latencies. Solving or even eliminating congestion from various heterogeneous masters in a SoC is a vital part of a scalable solution. An LLC achieves this by handling accesses locally and hence improving the memory efficiency.

Power Reduction: A by-product of higher memory efficiency is power reduction, which is achieved by making intelligent decisions on managing, optimizing, and accessing the memory. A configurable LLC can be tailored to the specific characteristics of the DDR controller and, more importantly, can augment the optimizations of the controller by making smart decisions within the LLC.

Coherent vs Non-Coherent: LLCs can be created in a coherent manner, aimed at reducing latency for coherent cache accesses by removing unnecessary lookups. This improves the latency for coherent access without compromising the non-coherent traffic that will physically bypass the coherent LLC. But some other systems are more concerned with improving bandwidth for traffic with high locality. This can be achieved with a non-coherent LLC that caches all accesses to a specific address range, independent of the coherent characteristics of the traffic. Another advantage of a non-coherent LLC is that it does not need to be flushed during transitions since it affects all traffic.

Reuse Cache as RAM: We all understand the cost of RAM, and the data array in the LLC is not an insignificant piece of real estate and should never go waste. While some applications benefit from a large LLC, other applications are better off with a directly addressable RAM. Having a runtime configurable option to camouflage the cache as a scratchpad RAM is key to maximum utilization of the cache in terms of both silicon resource and performance.

Workload-based power reduction: The natural hierarchy and structure of the LLC RAM banks lend themselves easily to selective power down schemes. Workloads that have a much higher requirement for conserving power would benefit from being able to shut down part of the LLC (including its data array) or in its entirety. This is another valuable runtime feature at one’s disposal to save the last pinch of spare power in the system, especially in mobile applications.

Not all slave agents are created the same: One solution doesn’t fit all. The usage of a DDR can be very different from that of an on-chip RAM or a flash peripheral. Having multiple LLCs, based on address ranges, gives the system architect the ability to customize each based on the slave characteristics. This provides a more consistent behavior without having to switch back and forth to accommodate varying requirements in the different address ranges.

Heterogeneous allocation needs: Similar to slave agents, not all masters are built the same. If the traffic characteristics are not consistent, then why would the allocation needs be? A fixed allocation also causes inconsistent and unpredictable behavior when masters with different traffic characteristics use the cache. The LLC is the perfect vehicle to provide the ability to partition its resources and allocation policies based on the masters, on a runtime basis, by allowing the application developer to partition the LLC resources amongst the various users. This not only provides the guaranteed resources required by some applications with stringent performance requirements but also avoids the trampling of sets and ways amongst the requesters, in turn reducing unrequired evicts.

In the end, an LLC is not just an L3, but much more than that. Do you have memory bottlenecks? Are the system resources being shared and reused? Is your system behavior temperamental? Do you have enough RAM? Need further power reduction? If your answer is yes to any of these, then it is high time you consider an LLC and give your system that extra gear that it needs. Think of it this way—does your kitchen only have pans, knives, and utensils? Probably that, plus a careful and judicious selection of other kitchen tools to make the job effortless and the result exceptional.



Leave a Reply


(Note: This name will be displayed publicly)