AI, Performance, Power, Safety Shine Spotlight On Last-Level Cache

Overcoming memory limitations in automotive systems.


Memory limitations to performance, always important in modern systems, have become an especially significant concern in automotive safety-critical applications making use of AI methods. On one hand, detecting and reporting a potential collision or other safety problem has to be very fast. Any corrective action is constrained by physics and has to be taken well in advance to avoid the problem.

AI and Memory
On the other hand, AI is used to figure out the possibility of a collision. And that AI is enormously memory hungry, consuming hi-resolution color video images and running neural net algorithms on those images. However the AI might be organized – say feature-map constant or output map constant – very large amounts of data need to be kept in memory during these calculations. Going to DRAM is very slow and power-hungry. AI depends heavily on effective caching to minimize needing to go off-chip.

Given the complexity of modern accelerators caching can be inserted at multiple levels, just as it is in processors. The last of these levels, between the off-chip memory controller and the rest of the SoC, is the last-level cache. A special challenge for this cache is that it doesn’t just serve the AI accelerator (or accelerators). It’s also the last level for the CPU cluster, GPU and any other functions in the SoC. Since what makes for effective caching can be very application specific, this last level cache has to be very flexible, even supporting run-time configurability.

CodaCache IP
The Arteris IP CodaCache IP was designed to meet this need. First each instance can store up to 8MB of data. Tags and data are organized in banks, allowing for parallelism in accesses, a common expectation in AI operations. Memory can be partitioned into up to 16 ways, allowing independent applications each to work with their own subset of the cache without being forced into evictions required by another application. Importantly this only affects eviction. To preserve the integrity of the memory model, any application can read or write an address in the cache.

For AI applications, this handling of partitioning and eviction is important, for example in handling weights in weight-constant architectures. Those weights are guaranteed to be resident in cache for as long as possible before being evicted.

Sometimes you need extra unstructured scratchpad memory, something you can configure in CodaCache at runtime or boot-time, for feedback state data for a recurrent neural network, for example.

AXI Connectivity, Safety
While CodaCache’s capabilities are intrinsic to some of our NoC products, we first built it to serve very general needs, whether or not the design team was using our NoCs. The IP connects directly to AXI compliant on-chip interconnect and memory controller interfaces and can be configured through an APB port.

Most important of all for the automotive market, this cache IP is now designed to be ISO 26262 compliant as a Safety Element out of Context (SEooC). The memory uses ECC, the AMBA interfaces use parity protection or other methods, and the full block has been analyzed using FMEDA (Failure Mode Effects and Diagnostic Analysis) as is required by the standard. It comes with safety manuals to assist an integrator in developing and proving their own safety compliance.

Safety compliance is a recent addition to this IP and the active interest by companies in the automotive market proves the need to reduce power consumption and system latency in ADAS and autonomous driving systems. CodaCache last level cache is a key ingredient to meeting these requirements.

You can learn more about CodaCache here.

Leave a Reply

(Note: This name will be displayed publicly)