What sets automotive apart from the conventional wisdom on AI hardware markets.
Arteris IP functional safety manager Stefano Lorenzini recently presented “Automotive Systems-on-Chip (SoCs) with AI/ML and Functional Safety” at the Linley Processor Conference. A main point of the presentation was that conventional wisdom on AI hardware markets is binary. There’s AI in the cloud: Big, power-hungry, general-purpose. And there’s AI at the edge: Small, low power, limited application-specific features. Automotive AI doesn’t really fit into either category. To power ADAS and autonomous driving functions, these chips are extremely application-specific and require more performance than typical edge AI, are low power but not as low as IoT chips at the edge, and must be as low cost as possible. They also add a new angle – low latency because safety demands fast and deterministic response times. Add to all that the functional safety requirements demanded by ISO 26262 – inside the AI structure as much as everywhere else. Bottom line: Automotive AI SoC architectures are unique beasts.
The first obvious difference is that automotive AI SoCs are heterogeneous, much more so than the SoCs we know in other applications, even other edge devices. This is because the ADAS or autonomous driving runtime software is closely coupled to the chip hardware, and custom processing element types are used to accelerate specific parts of the algorithms. You’ll also find on-chip memories placed within the topology to deal with the specific expected dataflow of these algorithms. In contrast, data center AI devices are usually designed to run multiple industry-standard AI algorithms and benchmarks and generally use homogeneous processing elements, tiling multiple copies of the same processing elements in a regular topology like a mesh or ring and often resulting in large, reticle-spanning monsters. They’ll have some control and I/O too, but the AI accelerator subsystem overshadows other logic. However, an automotive AI SoC must be more self-contained with sensing, computer vision and I/O interfaces and must respond quickly with lots of CPU horsepower to run complex, and often virtualized, software loads.
The network-on-chip (NoC) interconnect is the logical and physical instantiation of these SoC architectures, turning PowerPoint block diagrams into a real chip. Here’s a fun way to imagine how an automotive AI NoC architecture evolves from a “vanilla” AI chip architecture. (Note: This is definitely NOT the way real architects do this!) Start at the “AI SoC baseline topology” with an array of homogeneous processing elements (PEs) connected by a mesh network, as shown above on the left. Again, these are typical of datacenter AI accelerators. Now start hacking it: All of those PEs are not needed because there are specialized requirements for this chip, so get rid of the unwanted PEs and then use many different kinds of PEs mapped to critical software algorithmic and dataflow functions, as needed. We need to stuff in different types of memory for fast, low-power accesses because off-chip memory adds unsafe latencies and is brutal on power consumption. Then, let’s cut out unnecessary connections between the mesh routers, add lighter-weight routers so hops are not forced, and get rid of any white space left by this hacking.
By the time we’re done, the network-on-chip is a highly complex structure – not a mesh, more like a tree, a very overgrown tree.
Without care, automotive AI SoCs can be a huge power drain as they move big data through a network of complex calculations. But good power management can be more fine-grained than all-on or all-off. Processing elements and memories can be clock-gated when parts of the network are temporarily inactive. The interconnect itself could be a big power suck – wires don’t scale as well as transistors in advanced technologies, and these may be long wires. Intelligent clock gating, possibly semi-autonomous within the interconnect, can gate clocks in network elements to minimize this power draw.
Architects will boost PE performance using interface bus widths beyond standard limits, for example, 8k bits for AXI. Teams will build interface logic or adapt standard interfaces to meet this need. Everything is subservient to meeting performance and latency goals through extreme parallelism if needed.
In a similar vein, designers also tweak switches and other logic within the network to optimize performance, area, and power for their expected system-level use cases.
Automotive AI chips must meet ISO 26262 requirements for the system, which includes semiconductor IP elements like processing subsystems/PEs, embedded memories, and the NoC interconnects. Failure mode effects and diagnostic analysis is used to quantitatively prove the functional safety mechanisms within the chip can provide the required diagnostic coverage to meet the engineering team’s desired ISO 26262 automotive safety integrity level (ASIL). Examples of functional safety mechanisms include error-correction code (ECC) to protect internal SRAMs and in-flight data, hardware redundancy and lockstep, which includes duplication or Triple Modular Redundancy, and advanced checker, built-in self-test, and reporting logic. Examples of NoC safety mechanism implementations include ECC to protect first in, first outs, hardware duplication and checking for network interface units at the edge of the NoC and more.
Based on years of lessons learned, here are three key things integrators of IP for use within their ISO 26262-compliant SoCs need to keep in mind (yes, I’ve seen all of these over the years):
The Linley Processor Conference is a great venue to achieve a big-picture understanding of emerging issues and technologies in the semiconductor industry. If you would like to dive further into understanding how to implement AI/ML and functional safety capabilities into automotive SoCs, contact us!
Leave a Reply