Why the old approach didn’t work and what’s the new thinking behind multi- and many-core processors.
It’s probably too harsh to say that multicore has been a failure, but it’s flat-out wrong to say it has been successful.
Multicore was an inevitable outgrowth of Moore’s Law. You simply can’t keep turning up the frequency for processors at advanced nodes without cooking the chip into oblivion. In theory, four cores running at a much cooler 1GHz should be better than one core running at 3GHz. By that thinking, 20 cores should be even better.
There are two main problems with that reasoning, however. First, most software isn’t suitable for parallel computing. Databases and embarrassingly parallel applications such as graphics and some scientific calculations work best. So do some of the EDA design tools. But the majority of applications that people use can’t be parsed out to more than a couple cores—generally two and no more than four—and they can’t be scaled as more cores are added to each new node.
Second, memory becomes a bottleneck when it’s shared by cores. The problem with using approaches such as virtualization across a multicore chip is that they’re all sharing a common memory. Even when memory is split up into multiple discrete segments, there is still a challenge to keep everything straight—and one that is hardly scalable for many more cores.
There are other issues, as well. Being able to quickly turn on and off cores requires at least some power to keep them operating. In addition, making all the cores the same size to handle any available application is an inefficient approach. A simple executable file doesn’t require as much energy for processing PowerPoint or Excel, but it may require more speed than e-mail.
The solution—and one that is gaining traction across the design world—is a different approach to using these cores. If the software applications cannot be written in parallel and written to scale, then why force the issue? The alternative is to design cores for specific applications or functions, each with its own block of memory—or at least with a wider I/O to reach that memory.
Wide I/O, whether it’s a transposer or a through-silicon via, is a major shift in thinking for how SoCs are designed. It’s not just about the I/O. It’s about the functioning of many parts of the chip, whether that’s a collection of processor cores in a single place or scattered around the SoC, and whether they’re the same or different. It’s also a recognition that the old approach of designing hardware as one-size fits all doesn’t benefit the application’s function, performance, or the amount of power that’s wasted in delivering that performance.
These are subtle changes in thinking, but the results will create a profound shift across hardware design for years to come.
–Ed Sperling
Leave a Reply