The lines between MCUs, CPUs, GPUs and DSPs is blurring, but for very specific reasons.
By Ed Sperling
Choosing processors for an SoC, a system-in-package, or even a complete system is becoming much more difficult, and the challenge is growing as demands on performance, power, area and time to market continue to increase.
There are many reasons why this is becoming more difficult—and some designs will require more tradeoffs than others, depending upon IP re-use or a particular market segment—but there are four key areas that keep surfacing in discussions with a wide variety of hardware and software engineers, tools developers and academic researchers.
1. Software. Making generalizations about software is difficult, because there are so many different facets and types of software that can affect hardware. Moreover, it can affect hardware in multiple ways. But one thing is becoming increasingly clear: The more software that is developed by chipmakers, the more they understand how to improve it and to make it work better with the hardware.
Software engineers always have focused on performance. Software has to run quickly and be at least reasonably accurate from a functional perspective. But until recently, most companies never seriously considered how to do that more efficiently. The rule of thumb has been to let the operating system or the middleware deal with the connectivity and performance issues to get working code to market faster, as well as to power down blocks that are not in use.
That isn’t always the optimal solution, however, and not all code runs on all types of processors equally. A general-purpose central-processing unit (CPU) doesn’t handle graphics nearly as well as a (graphics processing unit) GPU, and neither of them are as efficient at handling audio or video algorithms as a digital signal processor (DSP).
Likewise, a multi-core or many-core CPU may not perform as well for some applications as a single-core, single-threaded CPU, while in other cases—notably video and image editing, databases and heavy number crunching applications such as EDA tools—it can leave single-core, single-threaded performance in the dust.
The trick is understanding what’s really needed where, and that requires a much deeper understanding of the software, the hardware and the target markets the chips will serve.
“There are specific places you’ll need double-precision floating point,” said Barry Pangrle, a solutions architect for low-power design and verification at Mentor Graphics. “ That’s not necessary in a game. If you miss a few pixels out of a million it doesn’t matter. But with a financial application you don’t want to lose anything.”
He noted there are applications that map well to a lot of cores, and those that do not. If they don’t, replacing the CPU with an FPGA is a possibility, for example. An alternative approach is to add more application-specific processors throughout an SoC, or even in a package or on PCB.
Lines also are blurring between different types of processors, though. A microntroller unit (MCU) looks increasingly like a CPU when memory is added. Microchip, for example, has added a DSP to some of its MCUs, making them even more unrecognizable. And Texas Instruments has long had ARM cores on its DSPs.
“MCUs typically are designed with a very specific purpose in mind,” said Ian Anderton, marketing manager at MIPS. “If you look at MCU vendors, they have a product range that could include 200 products. Each of them targets different frequency and different performance. In the past, 8- and 16-bit versions were enough. Now we’re seeing them being used for performance-based systems, and in motor control we’re seeing them being used to replace software, which is too slow.”
That has set off an entirely new race to shrink the size of the MCU and thereby reduce the cost, making them potentially more attractive for controlling everything from individual cores in a many-core or multi-core configuration, including turning them on and off as needed, to communications and I/O.
2. Process technology. These tradeoffs between software and hardware, and within hardware lines themselves, has set off another set of choices that would have been unthinkable several process nodes ago. Moore’s Law has hit an economic wall for most companies. While it’s still technically possible to shrink features—at least as far as 6nm or 7nm, which is the farthest that researchers are talking about right now—it clearly will be too expensive for most chipmakers.
It already is unrealistic for many companies to push to 28nm, and the number of companies that have adopted high-k/metal gate technology at 28nm represent a small fraction of the companies turning out chips at that node. Cost is the big issue, and the unavailability of EUV until at least 14nm—a driving factor in Intel’s investment in ASML this week—has made double and potentially triple and quadruple patterning an issue.
That has created probably the biggest incentive—along with the enormous supply of pre-written and pre-verified analog IP—to mix and match chips in a 2.5D or 3D package, and it has shifted the focus away from area toward performance, due to shorter wires, and energy efficiency, primarily because of the inability to remove heat from devices.
It also has turned the discussion to subsystems, different cores, and from homogeneous processors to heterogeneous processors in multiple configurations.
“The big complex problems still have to be partitioned manually by the architect,” said Steve Roddy, vice president of marketing at Tensilica. “That means a lot of system modeling, whether you choose a network on chip or a bus on chip, and it leads to a discussion about which flavor of processor to choose. If you can get a single core with the same performance as two cores, you choose the single core because it’s simpler. And if you’re developing an audio subsystem with one DSP and someone else is offering two, you choose the subsystem with one.”
3. Integration. But adding more subsystems and more off-the-shelf IP blocks also creates problems, which circles back to the type of processor being used.
“The uncertainty of the interaction between subsystems is one of the biggest challenges today,” said Roddy. “It’s easy to take a DSP from Tensilica or MIPS and design a system from scratch around them. But when they’re in subsystems the discussion comes down to how these subsystems will behave when they’re fighting for memory or a bus. The more processors you have in these subsystems, the bigger the integration challenge.”
That challenge only grows as chips are stacked in 2.5D and 3D configurations. Even aside from the physical effects, the number of processors and their placement can have a serious impact on designs. This is particularly critical in SoCs, which are a unique mix of components compared with more regular processors from companies such as Intel and Nvidia.
“The choice of what kind of processor you use is still going to be application-dependent,” said Mentor’s Pangrle. “But when you look at the tradeoff of performance versus energy, scaling and integration are winning out. In many cases it comes down almost to data choreography. You want to keep data next to the compute elements, and that trend is continuing.”
But how to take advantage of that most efficiently is the issue. ARM has taken a unique approach with its BIG.little chip, which includes both a high-performance, high-energy consuming processor with a lower performance, low-energy processor. FPGAs, with a centralized control of the architecture, provide a different option. Still, neither of them compares to the efficiency and performance of a GPU running graphics or an x86 processor running a spreadsheet.
How to integrate all of those together in the most efficient way, and still provide enough flexibility so those technologies can be applied for other purposes remains a big challenge. But as more functions are added into devices, and more tradeoffs are made between software and hardware, between efficiency and speed, and between accuracy and power, performance and area, the challenge of choosing the right processors will continue to befuddle even the best engineering teams and systems architects. This is intellectually tough stuff, and it’s getting harder.
4. Ecosystem maturity. It may be hard for engineers to swallow, but one of the critical factors in choosing processors is the maturity and completeness of an ecosystem. For an integrated device manufacturer this is less of an issue, but the rising cost of development and the skyrocketing complexity of devices makes it difficult for even the biggest IDMs with the deepest pockets to do everything themselves.
Having a complete ecosystem means getting to market on time, with the necessary support, tools and track record.
“Ecosystem is the No. 1 factor for us,” said Philippe Magarshack, group vice president for technology research and development at STMicroelectronics. “It’s the reason we choose ARM in the mobile market. For us, there is no alternative. They have the most complete Linux and Android ecosystem. It’s also the reason we work with Freescale in automotive with a PowerPC core. There’s a lot of software legacy that we have to deal with, it’s what the customers have chose, and it’s very credible.”
He said that’s followed closely by performance and power, but the real differentiation factor is the ecosystem. “That’s what determines time to market and adoption by customers,” he said.
Leave a Reply