The Ubiquitous GPU

GPUs are playing bigger roles in maximizing system efficiency. The key is figuring out what works best for a particular task.


By Ann Steffora Mutschler

No matter the application area, GPUs are likely playing a role like never before—even to accelerate EDA software algorithms. It’s no wonder given the ability of GPUs to handle parallel processing much more effectively than CPUs. And when coexisting in a heterogeneous system, GPUs allow the design team to maximize efficiency and performance by allocating tasks to the best processor to handle them.

In fact, we are hearing about GPUs being used in places where CPUs would have only been considered in the past largely due to GPU architectures themselves.

“In general to do their job of creating more polygons per second more efficiently, GPUs started moving five or more years ago now to processor-based internal architectures,” said Drew Wingard, chief technology officer at Sonics. “So instead of doing hard-wired stuff, we started doing stuff with things that look more and more like a general-purpose processor inside them. And they’re putting together large numbers of them, and we’re starting off from a place where they’re already memory-bandwidth dominated. So they got very good at sharing memory bandwidth, because they had to do it to get high-performance graphics.”

In the SoC space, one of the biggest challenges is in sharing things such as memory, but in a GPU it’s a more uniform problem. As such, GPU developers have developed some specific techniques that are in many ways better than what other embedded processors have had to deal with so they kind of have an advantage there, he explained.

Dipesh Patel, executive vice president and general manager of ARM’s physical IP division, agreed. “Some tasks are very parallel and it is more efficient to do them on a GPU than on a SIMD engine that is attached to the CPU. A GPU can be thought of as a massively parallel set of compute engines doing lots of very efficient mathematical calculations. Depending on the task, your GPU is better suited to run those tasks. Image processing is a great one. There are other similar things we can do on the GPU as well.”

He noted a key part of this is the programmability for the GPU from OpenCL.

“Think of it this way: You have this massive compute capability in the GPU, which historically would have been used for a very specific set of tasks, so by having these libraries and OpenCL support we are bringing the software and the infrastructure that lets [designers] harness that compute capability for different applications,” added John Heinlein, vice president of marketing for ARM’s physical IP division.

Implementation challenges

Despite the obvious benefits of GPUs, there are challenges. Phil Dworsky, director of strategic alliances at Synopsys, explained that much of the learning about GPUs in a 20nm flow came out of a project involving Synopsys, TSMC and ARM.

Case in point: GPU designs tend to be very large, which translates to big memory requirements and potentially long runtimes, he said. Complicating matters is complexity, manifested in design congestion on top of double patterning, which is required at 20nm.  These design issues require early and accurate congestion analysis.

Dworsky said the increasing demand for visual computing is driving the need for more GPU resources, which means there will be specialized functions for doing graphics on a chip. “The fact is that’s already there. The question now is can you take advantage of that compute facility in other ways, as well? I don’t think there’s a tipping point where somebody suddenly says, ‘Hey, let’s use GPUs and stick them on our chips because we can do math.’ I think it’s much more about taking advantage of what is available.”

This is in line with ARM’s approach. Jim Wallace, vice president of product marketing for ARM’s media processing division said, “When it comes to CPU versus GPU, I don’t think it’s one versus the other, or one replacing the other. It’s more about the right processor for the right task. If you look at the CPU, it’s normally focused on performance of single-thread operations including running general purpose OSes. If I flip on the other side, on the graphics side, it’s very parallel. Graphics processing is highly threadable. Again, you look at each pixel on the screen—it’s got to address a lot of those, so it’s well suited to throughput computing. They are addressing two different areas. The CPU is very serially dominated and latency-sensitive. GPU is extremely parallel and bandwidth-sensitive. By having both of those together, you really are looking at a heterogeneous system. You can move tasks around between one and the other or you can split tasks up between one and the other. “

For example, in video there are certain functions that can leverage parallel execution within the video codec engine and other areas that are more serially oriented. Also, CPU resources can be freed up by offloading to the GPU, thereby improving efficiency and responsiveness of the complete system.

What will propel even more use of GPUs comes down to the software, he said. “It’s really about the APIs. Those APIs have made it easier for developers to program features of the GPU. As we go forward, the growth of heterogeneous computing is helping with the mapping of one to the other and being able to use different accelerators/processing units, whether it is a CPU or a GPU or a DSP.”

To this end, there is work happening including the Open Computing Language (OpenCL) framework, the Heterogeneous System Architecture (HSA) Foundation and Google’sRenderScript to name a few.