First of Two Parts: Clock speeds can’t go higher, and most software still can’t take advantage of multiple cores. A look at what’s next.
In 2004, Intel introduced a new line of Pentium chips that ran at 3.6GHz. Fast forward to today, and the company’s i7 processors run at 3.5GHz with a Turbo Boost to 3.9GHz.
There have been many improvements in the meantime. There is more cache and dramatically faster access to data stored in that cache. And there are more cores with improved coherency between them. But the big problem is physics—it’s impossible to turn up the clock speed on a single core for very long without burning the chip. More cores can solve that problem, but most software applications still can’t take advantage of more cores. Even controlling current leakage, and subsequently heat, with finFET transistors provides only a one-time gain.
So what’s the next step. For Intel, the future as trumpeted for a half-dozen years by CEOs at the annual Intel Developer Forum, is a bigger push into the SoC world and smaller, less power-hungry processors for some applications. While Intel remains the undisputed giant in PCs and inside of data centers, the bulk of computing is no longer being done by general-purpose processors inside of computers that are plugged into the wall. It’s being done by handheld devices—billions of them—with data served up in the cloud by farms that use chips from Intel, IBM, AMD and in some cases even ARM.
This is a shift that has been slowly creeping forward, but it is gaining momentum as the economics of Moore’s Law change. This is borne out in the balance sheets of the makers of general-purpose and specialized processors. In 2013, Intel’s revenue dropped 1%, the second year in a row its revenue was down. More telling, though, the PC group’s revenue shrank 4% in 2013 while its data center group rose 7%. ARM’s revenue, meanwhile, grew 22.4%. And Synopsys with its ARC processor, Cadence with its Tensilica unit, CEVA and Andes Technologies, all reported solid growth in those businesses last year.
There are multiple ways to read this data, and put in perspective Intel remains by far the largest processor maker in the world. It has many options for growth in multiple markets and huge potential in all of them. However, just cranking up clock speeds inside general-purpose processors across all devices is no longer one of its options—something that also may account for recent reports that IBM is considering selling off its semiconductor division.
“What we’re seeing is that the general-purpose processor is improving at a decelerating rate,” said Chris Rowen, a Cadence fellow. “There is continuing improvement in process technology at diminishing returns, and general-purpose architectures are suffering the same fate. There are not many magic bullets here. Moore’s Law density scaling goes on. High-end processors have more advanced system features, but it takes higher energy per operation. In reality, general-purpose processors are less efficient per operation than 10 years ago.”
Specialized processors, in contrast, are more efficient, which is why they are proliferating inside of SoCs these days. In some cases, there are dozens of them. In the future there may be hundreds of them. While the big processors are progressing slowly, smaller processors are really beginning to hit their stride in terms of market acceptance. And from all indications, and with some help from physics and the need for new materials and multi-patterning on masks over the next couple of process nodes, this trend is likely to accelerate even further.
“Heterogeneous system architectures will become more dominant,” said Rowen. “They’re already quite widespread, but they’re going to creep up into the cloud, too. We’re already seeing what is essentially a mini-cloud of processors for each applications processor. If you look at this by lines of code, it’s the 80/20 rule. The 80% of the code doesn’t run very much. The 20% is used more often, so to increase efficiency you have to to more computing with the same amount of energy. You get more performance and efficiency on a specialized processor.”
ARM has taken note of this shift with its big.LITTLE architecture, combining heterogeneous processors working in tandem rather than homogeneous, same-sized cores that can be used for a distributed workload. While many applications are power-bound, and some are I/O-bound, those that are limited by single-threaded performance are rare. But that also doesn’t mean they need the most powerful processor core to get the job done. And whether that means the latest process node is a subject that is under debate.
“There is some room to go by cranking frequencies or adding architectural features, but most of the gains will be as a result of process scaling,” said Rob Aitken, an ARM fellow. “Many of micro-architectural tricks to enhance single-thread performance cost power, so they probably won’t show up until 10nm, when process scaling can move existing designs to a lower ‘power band.’”
Aitken noted that adding more cores eventually results in diminishing returns. “Pushing forward is a combination of software—operating system, compilers, etc.—and memory system design, which includes things like cache coherence. Some of the CPU-GPU approaches also show promise. And a couple of other physical tricks might change the dynamics, such as logic-on-logic 3D, super-eNVM (offering improvements in SRAM speed, DRAM leakage, flash density) or even monolithic 3D devices (with nanowire voltage).”
One of the biggest challenges in the processing world involves a mindset shift. It’s no longer just about performance. It’s about the ability to get the job done with the same or lower power. And while that may be obvious enough to engineers working with semiconductors, try selling that concept even to educated consumers. The progression of technology depends upon a market that is ready to accept it en masse. Eight cores may be more effective than four cores, but the real benefit is when those cores are rightsized to specific functions.
“The gigahertz problem is for the server,” said Charlie Su, chief technical officer at Andes Technology. “You need a different technology architecture to scale all of this. Memory is still the bottleneck, and that’s standing still or moving very slowly. With memory access, the issue is how to improve latency.”
He said this will become even more critical with wearable electronics because of the need to transition from sleep to wakeup and back to sleep. Those devices are always on, but usually not fully powered up. “If processors are performing only one or two functions, generally you can increase the efficiency,” said Su.
That seems to be the generally agreed upon conclusion among the processor makers. Smaller, application-specific processors are better for the majority of the computing tasks.
But there also are different architectural approaches that chipmakers are beginning to use, or with they are experimenting with now, that could have profound effects on the future of semiconductors. Those will be dealt with in the second part of this report.