GPUs Power Ahead

AI, ADAS, deep learning, gaming, and high performance computing continue to drive GPUs into new application areas.


GPUs, long a sideshow for CPUs, are suddenly the rising stars of the processor world.

They are a first choice in everything from artificial intelligence systems to automotive ADAS applications and deep learning systems powered by convolutional neural network. And they are still the mainstays of high-performance computing, gaming and scientific computation, to name a few. Even well-known challenges in programming GPUs have not stopping their trajectory, thanks to better programming languages.

The increase in GPU computing is very much tied to the explosion in parallel programming, said Roy Kim, Accelerated Computing Product Lead at Nvidia. “Going forward, and it has been true for a while now, the way you get performance out of processors is to go parallel for many, many reasons. It just happened to be that when Nvidia discovered that GPUs are great for general-purpose processing, it was around the time when a lot of researchers and developers were scrambling to figure out how to get more performance out of their code because they couldn’t rely on single cores to go any faster into the frequency.”

That was roughly in the mid 2000s, and Nvidia had the foresight of making GPUs fully programmable. Recently, GPU computing made its way deeper into data centers with artificial intelligence. And while it is just the beginning in terms of AI GPUs, there are some notable benchmarks, such as Google’s Go match between the Grand Master of Go and its AI machine AlphaGo. AlphaGo won.

GPUs came out of visual computing and graphics as a way to process the millions of pixels on a screen, which is parallel programming on a massive level. That market has grown to include not just the companies that sell the GPU chips, such as Nvidia and AMD, but others that specialize in embedded GPU cores primarily aimed at the mobile space, including ARM’s Mali family and Imagination Technologies’ PowerVR graphics processors.

In the embedded GPU space, ARM has seen steady growth in chips shipped with its Mali GPUs. In 2014, worldwide shipments of chips containing an ARM Mali graphics processor were 550 million; in 2015 that number rose to 750 million, a 36% increase year-over-year. The company predicts even higher growth this year.

“We see an extension of the trend, whereby the silicon partners addressing the superphone/tablet market take the latest, greatest ARM Mali GPU and make chips on the bleeding-edge processes as quickly as they can,” said Jem Davies, ARM Fellow and Vice President of Technology for ARM’s Media Processing Group. “They are thermally limited on power, and need the advantages of the latest silicon process. They fit as many GPU cores as they can for maximum performance within their power budget. Other partners, addressing the mass-market, use a wider variety of processes and fit more modest numbers of cores, balancing performance against cost. Speed of execution on the design side and execution (silicon hardening) remains key to TTM.”

Technically, the GPU advantage is relatively straightforward. “It is a digital world – it is all ones and zeros – so everything can be done in computation,” Davies said. “Graphics is a computational problem whereby we specify the things to be printed, displayed, etc., in a form that says for every pixel on the screen, run a little code. And for every vertex in the scene, run a different piece of code. With a modern screen of 1080P it’s at least 2 megapixels, so I’ve got 2 million executions of this little snippet of code. I might have 300,000 vertices in the scene, and I’ve got this pool of things to be done repeatedly. We say that sounds a lot like thread-level parallelism, so what I can do is run all of those little snippets of programs in parallel rather than in sequence. And we design a processor to do that—regardless of the fact that it’s a graphics processor. Yes, there is some fixed function stuff, which is all about putting colors on the screens and things like that. But as a processor just executing these little snippets of code, we make it very good at running things in parallel.”

One of the main differences between CPUs and GPUs is that GPUs are not nearly as good doing one thing in a very short period of time.

“A CPU will talk about how many clocks it takes to do this instruction, and that will be important, and how many clocks does it take to fetch this load instruction from memory, and again that will be important,” he explained. “On a GPU we would say how long one of them takes to execute is much less important than how long do thousands of them take to execute. GPUs are all about throughput and parallelism, whereas CPUs are all about speed of execution of one thread. If you come up with another problem space that is parallelizable — if I can extract a lot of threads — then running it on a GPU could be very effective. If you’ve got a problem that looks like running Linux and responding to interrupts, it will run very badly on the GPU, so you can’t have something for nothing.”

In effect, a CPU is pretty good at everything. A GPU, in contrast, is very good at one corner of the problem space, Davies said. “The inevitable consequence, which we don’t necessarily brag about but is nevertheless scientifically obvious, is that having picked one corner of the problem space to be really good at, there will be corners we are less good at.”

In the mobile market, GPUs are generally more focused on power and energy efficiency for their main task than those designed for graphics.

“ARM and Imagination will tell you about being able to use the GPU for other kinds of apps, and it’s always a matter of degree,” said Chris Rowen, Cadence Fellow and CTO of the IP Group. “It’s not black-and-white, so they can legitimately say you can use it for other things, that they’ve built the software tools and the programming model so that it can be used for other tasks. But the mobile phone is a domain with much greater specialization, and it’s hard to be the most efficient graphics rendering device and also be general-purpose to cover increasingly diverse and divergent sets of tasks.”

Rowen said that sometimes a processor needs to be designed to address a very specific task. Case in point: In the server domain, Google recently rolled out its Tensor processing unit, which is a logical replacement for a GPU in the server that it said is about a factor of 10 more efficient.

“When you know what problem you are often solving, you can often do so much better than if you are repurposing something that was really built for another domain,” he said. “So yes, Google got an order of magnitude, but they know their problem and I can find an order of magnitude on almost any problem just so long as it is significantly different from the problem that the competing hardware was designed to address. They knew they had a class of problem in neural networks training and inference, and it has certain computational properties. There are some things it needs to do well, and other things that you find in a GPU that you don’t need. Being able to craft it around the algorithms, and being able to craft it around the data types, and being able to leave out the things that are irrelevant in a GPU inevitably gives you something that is going to be smaller and faster and lower power than that repurposed sort of thing. If you are trying to do your daily commute and the only thing you have available is a Mack truck, you can get to work, but that big 18-wheeler with a big trailer is probably more than enough to carry your laptop. So being an order of magnitude more energy efficient than a Mack truck is not hard. You know that you are not carrying 20 tons of goods.”

Expanding opportunities for GPUs
That doesn’t limit what GPUs can do or where they will end up, however. As Sundari Mitra, CEO of NetSpeed Systems noted, in the last five years, we saw the advent of graphics as being something that is really blazingly fast for that particular aspect of what the compute industry needs to do.

“We saw that evolve and embrace some of the neural or machine learning techniques,” she said. “The graphics chip companies are the first companies that went into it and realized they can be adapted to the graphics processors to do more special function and analysis and adaptive behaviors because it is conducive to that. While the CPU companies kept focusing on getting better specs and numbers, the graphics guys focused more on, ‘I’m this big, powerful thing. Let me see what more big, powerful things I can do because I already have that envelope. You already told me I could take liberties that separated me out from a lot of the low-power. Let me innovate in that space and see what I can do.’ [As such,] graphics processors have a lot of innovation [behind them,] and as a result, some of those companies are now moving forward more aggressively, and doing more because they were not held back.”

She noted that graphics companies still have to learn the disciplines that the rest of the industry has put up with. “But they have an edge, and it’s going to be interesting to see what happens going forward.”

One that is clear is that there are many more of them than in the past, both in the computing market and on a piece of silicon.

“Take the Apple main application processor cores, as one example,” said Ken Brock, product marketing manager, Solutions Group at Synopsys. “You just see more and more of the area being taken up by GPUs. It used to be there was one, and now the Apple A9 has about a dozen of them because you can do in parallel what CPUs have to do in serial. There are more multithreaded CPUs, but the GPUs are built for parallelization so they run at a lower frequency —sometimes half the clock frequency. But at the same time they take up a lot of space so you can fill up the die with them.”

Brock noted that a GPU is about 80% logic and 20% memory. “When you have a dozen of them on a die, that can pretty soon come up to a real cost of chip area.”

Ironic twist to GPUs
It has been speculated for some time that some algorithms used by EDA tools to design GPUs could be sped up by running them on GPU-based servers, rather than CPU-based machines. There appears to be some truth to that, but not for every tool.

“Massive parallel computation is still the path to contain run times, and we are continuing to invest and expand our technology in that direction with very good results using many-core CPUs,” said Juan Rey, senior director of engineering for Calibre at Mentor Graphics. “Unfortunately, GPUs have not proven truly useful for a large number of critical EDA algorithms, although they proved valuable in some niche applications.”

So while not appropriate for all applications — including EDA, yet — the possibilities for GPUs in computationally intensive tasks in on the rise. One that hits close to the design arena is in semiconductor mask optimization, where companies like D2S have leveraged GPUs in their mask data prep software by leveraging advances in GPU acceleration.

How many new applications GPUs ultimately is a matter of speculation. But at least for now, there doesn’t seem to be any shortage of possibilities.

Additional Resources on GPUs
A tour of Nvidia’s Failure Analysis Lab.
Research paper from Microsoft: Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses
Article from Florida International University: A Case Study on Porting Scientific Applications to GPU/CUDA

Related Stories
Need a low-power device design? What type of processor should you choose?
How To Choose A Processor
There is no clear formula for what to use where, but there are plenty of opinions about who should make the decision.