ECUs that are designed today need to have the performance and flexibility to run the workloads of the next decade.
The pace of innovation in automotive is accelerating. Electrification, advanced driver assistance systems (ADAS) and vehicle connectivity are revolutionizing the in-car experience, which is now largely determined by the capabilities of the car’s software and electronic hardware.
When a vehicle can receive software upgrades while it is on the road, the electronic control units (ECUs) that are designed today need to have the performance and flexibility to run the workloads of the next decade. A well-designed control system therefore needs the adaptability to take advantage of the strengths of individual components and, depending on the workload mix, assign the right task to the right processor to optimize efficiency whilst delivering the right level of performance and safety required.
A single automotive ECU typically includes specialized AI accelerators, arrays of application and real-time CPUs, as well as a GPU. The AI accelerators efficiently process the perception task of the ADAS – working out the car’s surroundings – in particular, object detection and semantic segmentation. The CPU addresses decision-making and sequential control tasks as well as running the main application. The GPU has been commonplace in cars for over a decade thanks to its ability to deliver smooth, responsive user interfaces to the cockpit and infotainment systems.
But, as has been discovered in other industries, such as the data center, GPUs are good for so much more than just graphics processing. Their general-purpose programmability and high performance for parallel dynamic compute tasks make them a valid processing option for a wide range of workloads that form the backbone of ADAS and autonomous vehicles.
So, when the GPU sits alongside the AI accelerator which might deliver over 100 TOPS of compute, as well as CPUs, what compute workloads could, and should the GPU be used for? This article sets out why, where, and how GPU compute can be deployed in a vehicle.
Industry-standard APIs such as OpenCL, Vulkan and OpenGL provide a stable programming interface for software developers looking to create high-performance applications on the GPU, while optimized software libraries provide the mechanism to achieve maximum efficiency and tight control of scheduling and memory management. The use of APIs reduces complexity and gives freedom from the hardware to the software developer. They make it easy to take advantage of the inherent parallelisms of GPUs and to port code to different platforms.
There is a growing ecosystem of frameworks and libraries with OpenCL back-ends that give a quick time to market as well as an opportunity for higher-level optimization and integration as part of a heterogeneous compute system. It includes AI deployment environments as well as computer vision and other general-purpose compute libraries.
A wide range of companies are supporting or developing applications for GPU compute because of the GPU’s high-density compute and flexibility, resulting in an ecosystem that is active and collaborative, seeking to enhance the utility of GPUs to the benefit of the automotive sector as a whole. Porting from Digital Signal Processor code, CUDA, TensorFlow or Pytorch (among others) is becoming easier and better supported by the extended community of developers creating both open-source and in-house solutions.
The flexibility of the GPU is very different from the type of processing offered by an AI accelerator that sacrifices flexibility for extreme levels of performance for pre-determined workloads. In order to future-proof against the rapidly evolving AI landscape and the software-defined era, high-performance automotive ECUs will include a mix of both GPUs and AI accelerators so as to offer differentiated in-car experiences well into the future.
The GPU is a flexible and programmable parallel accelerator that can be used for a diverse array of tasks. The primary task of the GPU in the vehicle has historically been graphics and delivering a great in-car multimedia experience. But a GPU can also be used effectively in a vehicle’s ADAS to process general-purpose compute tasks.
A single GPU can be dedicated specifically to a human-machine interface (HMI) function with a second GPU supporting ADAS. Alternatively, with the right virtualization or multi-core technologies, the same GPU can be used to handle both graphics and compute-oriented applications. Depending on the performance required for the target use case, hardware architects can choose between GPU configurations with different numbers of processing units or different numbers of cores, but fundamentally selecting a single GPU architecture that can scale to fit both these applications can reduce complexity in hardware design and software development.
Putting this all together, as an example, in a single car an architect could choose to deploy a small, fillrate focused GPU running the multimedia display in the infotainment unit; a dual-core GPU in the cockpit that handles the safety-critical display on one core and supports the driver monitoring functionality with the other core; and then a powerful multi-core GPU in the ADAS controller to provide high-performance, programmable compute. OEMs can use the same principle of GPU scalability to offer different infotainment and ADAS options across the vehicle range without overly complicating the hardware design and software development and verification process.
GPUs are parallel computation engines with hundreds of individual data lanes that are capable of processing independent, complex computations at the same time. This design, originally intended to meet the computational demands of the latest graphics trends, makes them a prime target for application programmers looking to accelerate non-graphic-related but still highly parallel tasks.
There is a wide range of tasks for which the parallelism of a GPU is well suited. In the case of ADAS functions, this includes AI pre-processing for data gathered by a vehicle’s sensors (camera, radar, and lidar) which might include non-linear algebra as well as vector and data manipulation.
With the number of sensors in a vehicle only going up, other obvious workload choices for GPUs include the fusion of data from multiple sensors and AI post-processing tasks which demand high levels of compute and complex algorithm operations.
In addition to the typical graphics use cases, flexible GPU solutions can be efficiently deployed in vehicles to process more compute-oriented workloads such as:
Compute in vehicles is centralizing – but that’s not to say that processing is no longer done next to the sensors. To get an understanding of the vehicle’s surroundings, a vehicle on the road today could have a mixture of lidar, radar, and ultrasound sensors, as well as multiple cameras that combine to create a 360-degree view of the vehicle.
GPUs are well suited to performing video manipulation tasks efficiently, for example taking a camera stream, applying fish-eye correction with dewarping and 360-degree image stitching. They are also capable of processing the latest lidar and radar algorithms as a complement to the work done on the Digital Signal Processors (DSPs) prior to perception tasks.
A typical GPU that can deal with this use case will require 1 TFLOPS of FP32 operations and 4 TOPS of int8 operations.
L4 ADAS, where the vehicle performs all driving tasks on pre-approved roads and in certain circumstances, requires a significant step-up: sensor fusion. This is where the data streams from all the sensors in the car are overlayed to build an accurate, digital representation of the surrounding world which the vehicle can then use for behavior prediction and trajectory optimization. These camera stitching tasks, along with the laying of metadata over source images, are workloads that can be accelerated with a GPU’s high parallelism.
This use case will require a high-performance solution; typically over 10 TFLOPS FP32 and 40 DOT8 TOPS of GPGPU parallel programmable compute is recommended to add enough programmability to the system and be considered future proof.
The compute capabilities of the latest GPUs are an asset to the automotive systems designer. Imagination DXS, the latest ASIL-B GPU from Imagination, builds on the low-power, high-performance PowerVR GPU architecture that has been refined over the past thirty years and is deployed in billions of edge devices, from wearables to the cloud.
A block diagram of the Imagination DXS GPU.
Much of the silicon area of Imagination’s GPUs is taken up by highly dense, optimized compute elements. Traditionally these compute parts of the GPU have targeted complex 3D user interface and game rendering, but they also form a very efficient general-purpose parallel compute engine. The architecture’s combination of a single instruction, multiple threads (SIMT) architecture with an ultra-wide ALU (128 scalar data lanes) results in a very powerful compute engine with shorter inference times – ideal for automotive ADAS applications. It also packs in a groundbreaking functional safety solution, hardware-based virtualization, decentralized multicore and a carefully optimized software stack, including compute libraries for maximum GPU utilization.
From the cloud to the edge, GPUs provide a golden standard for artificial intelligence and compute tasks. Their highly parallel architecture and easy programmability make them a performant and flexible AI accelerator, well-suited to meeting the growing demand for compute in vehicles. To find out more about Imagination’s range of automotive-grade GPUs, visit the Imagination website.
Leave a Reply