Systems & Design

The Ubiquitous GPU

GPUs are playing bigger roles in maximizing system efficiency. The key is figuring out what works best for a particular task.

June 27th, 2013 - By: Ann Mutschler

By Ann Steffora Mutschler

No matter the application area, GPUs are likely playing a role like never before—even to accelerate EDA software algorithms. It’s no wonder given the ability of GPUs to handle parallel processing much more effectively than CPUs. And when coexisting in a heterogeneous system, GPUs allow the design team to maximize efficiency and performance by allocating tasks to the best processor to handle them.

In fact, we are hearing about GPUs being used in places where CPUs would have only been considered in the past largely due to GPU architectures themselves.

“In general to do their job of creating more polygons per second more efficiently, GPUs started moving five or more years ago now to processor-based internal architectures,” said Drew Wingard, chief technology officer at Sonics. “So instead of doing hard-wired stuff, we started doing stuff with things that look more and more like a general-purpose processor inside them. And they’re putting together large numbers of them, and we’re starting off from a place where they’re already memory-bandwidth dominated. So they got very good at sharing memory bandwidth, because they had to do it to get high-performance graphics.”

In the SoC space, one of the biggest challenges is in sharing things such as memory, but in a GPU it’s a more uniform problem. As such, GPU developers have developed some specific techniques that are in many ways better than what other embedded processors have had to deal with so they kind of have an advantage there, he explained.

Dipesh Patel, executive vice president and general manager of ARM’s physical IP division, agreed. “Some tasks are very parallel and it is more efficient to do them on a GPU than on a SIMD engine that is attached to the CPU. A GPU can be thought of as a massively parallel set of compute engines doing lots of very efficient mathematical calculations. Depending on the task, your GPU is better suited to run those tasks. Image processing is a great one. There are other similar things we can do on the GPU as well.”

He noted a key part of this is the programmability for the GPU from OpenCL.

“Think of it this way: You have this massive compute capability in the GPU, which historically would have been used for a very specific set of tasks, so by having these libraries and OpenCL support we are bringing the software and the infrastructure that lets [designers] harness that compute capability for different applications,” added John Heinlein, vice president of marketing for ARM’s physical IP division.

Implementation challenges

Despite the obvious benefits of GPUs, there are challenges. Phil Dworsky, director of strategic alliances at Synopsys, explained that much of the learning about GPUs in a 20nm flow came out of a project involving Synopsys, TSMC and ARM.

Case in point: GPU designs tend to be very large, which translates to big memory requirements and potentially long runtimes, he said. Complicating matters is complexity, manifested in design congestion on top of double patterning, which is required at 20nm. These design issues require early and accurate congestion analysis.

Dworsky said the increasing demand for visual computing is driving the need for more GPU resources, which means there will be specialized functions for doing graphics on a chip. “The fact is that’s already there. The question now is can you take advantage of that compute facility in other ways, as well? I don’t think there’s a tipping point where somebody suddenly says, ‘Hey, let’s use GPUs and stick them on our chips because we can do math.’ I think it’s much more about taking advantage of what is available.”

This is in line with ARM’s approach. Jim Wallace, vice president of product marketing for ARM’s media processing division said, “When it comes to CPU versus GPU, I don’t think it’s one versus the other, or one replacing the other. It’s more about the right processor for the right task. If you look at the CPU, it’s normally focused on performance of single-thread operations including running general purpose OSes. If I flip on the other side, on the graphics side, it’s very parallel. Graphics processing is highly threadable. Again, you look at each pixel on the screen—it’s got to address a lot of those, so it’s well suited to throughput computing. They are addressing two different areas. The CPU is very serially dominated and latency-sensitive. GPU is extremely parallel and bandwidth-sensitive. By having both of those together, you really are looking at a heterogeneous system. You can move tasks around between one and the other or you can split tasks up between one and the other. “

For example, in video there are certain functions that can leverage parallel execution within the video codec engine and other areas that are more serially oriented. Also, CPU resources can be freed up by offloading to the GPU, thereby improving efficiency and responsiveness of the complete system.

What will propel even more use of GPUs comes down to the software, he said. “It’s really about the APIs. Those APIs have made it easier for developers to program features of the GPU. As we go forward, the growth of heterogeneous computing is helping with the mapping of one to the other and being able to use different accelerators/processing units, whether it is a CPU or a GPU or a DSP.”

To this end, there is work happening including the Open Computing Language (OpenCL) framework, the Heterogeneous System Architecture (HSA) Foundation and Google’sRenderScript to name a few.

Ann Mutschler

(all posts)
Ann Mutschler is senior executive editor at Semiconductor Engineering.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

The Ubiquitous GPU

Ann Mutschler

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

The Ubiquitous GPU

Ann Mutschler

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored