How does an extra mid-size core affect power consumption?
Today, it seems to be all the rage for automotive manufacturers to try to continuously one up the competition by announcing a new transmission that has more gears or “speeds.” (Popular Mechanics did a nice article about why you would want more gears.) Basically, transmission designers want to keep the engine operating at or near its peak operating efficiency point and extend the operating range of the vehicle. For example, Mercedes now offers a 9-speed automatic, the 9G-TRONIC. Continuously variable transmissions sort of take this concept to its logical extreme by holding the engine at its peak efficiency point and changing the velocity by continuously changing the gearing ratio.
Building on this concept, MediaTek announced its Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology at the 28th Hot Chips conference this week in Cupertino, Calif. So you may be asking what’s a tri-gear SoC? Glad you asked. Using our automotive analogy, we can think of voltage as the accelerator and frequency as the RPM. If we scale the frequency using the voltage, analogous to controlling the RPM with the gas pedal, then how fast we’ll go will be determined by the gearing x the RPM. And the amount of computation we’ll perform will be determined by the frequency x the number of instructions per cycle (IPC) capability of the processor core.
Many SoC designers have long leveraged off of ARM’s big.LITTLE technology by using two different cores, one with lower IPC, better energy-efficiency and lower compute capability, and a second core designed with higher IPC, higher compute capability and less energy efficiency. Again, comparing to automobiles, if you buy a faster car you typically expect lower fuel economy. And for the Tesla drivers out there, the faster you drive your Tesla the shorter will be your overall travel range, too. So we can think of big.LITTLE as a twin-gear system. You use the lighter (low gear) core for light task loads and the heavier (high gear) core for the heavy task loads.
Figure 1. Task Load Distribution
MediaTek performed some studies shown in Figure 1 about the types of workloads performed on consumer devices and found that there were a significant number of scenarios where there’s a medium workload that warranted a “mid-performance” core. In other words, the typical little (low gear) core didn’t have the compute capability to handle the workload, but the big (high gear) core was actually overkill for that workload and led to a waste in energy.
Figure 2. “Three Gears” CPU Cores
MediaTek’s solution to this problem was to go ahead and design a third core to handle those medium workload tasks. In this case, they chose to design a higher clocking version of the A53 core to act as the “mid-gear,” and which is distinct from the lower clock speed “low-gear” A53. Figure 2, above, shows the performance/energy tradeoff for each of the cores. As the compute demand goes up, the workload is shifted to a higher performing core with the associated increase in energy consumption.
David Lee from MediaTek described a number of the enhancements that had to be performed to make efficient use of their new 10-core SoC. The coherent interconnect between the cores and memories was expanded to 3 ACE ports, one for each group of processors. Normally, the increased logic would imply an increase in power, but MediaTek claims ~50% power reduction by sub-module Fine-Grain Clock Gating (FGCG).
Through the use of what MediaTek calls Intelligent Core Activation Technology (ICAT), CPU0 can be dynamically migrated to the Mid-core CPU0 and the Min-cores can be taken offline. On top of ICAT, MediaTek also uses Asymmetric Multi-Processing (AMP), which packs tasks to the Mid-cores for sustainable performance and packs tasks to Min-cores for low power. The Hybrid Scheduler then uses HMP for an instant boost to utilize the Max-cores for urgent or heavy tasks as needed.
Figure 3. Tri-Gear Energy Savings
So, what does having a new “mid” gear buy the Helio X20? Figure 3 shows a figure comparing some different workloads run on an X20 with and without the Mid-core functionality enabled. This at least gives an apples-to-apples comparison on the same technology node, but probably leaves in some overhead of the tri-gear implementation—plus whatever modifications designers would’ve made to the Min and Max cores had they known there wasn’t going to be a Mid-core. Still, the results look pretty interesting, and having the Mid-cores on the X20 definitely helps improve its energy efficiency over just using the Min and Max-cores alone.
There have been numerous instances of designers taking software tasks and using chip real estate to create better-performing and more energy-efficient implementations when compared to running the same tasks on a CPU. As the technology nodes continue to scale and dark silicon is looking like it’s becoming more of a reality, the use of some chip real estate to create a broader spectrum of CPUs in terms of performance and energy efficiency is an interesting concept. What may be more interesting though is to see if this tradeoff becomes compelling going forward.
8 cores between “low power” and “medium power” is excessive when you also have the A72s.
I don’t see when a user would benefit from 4 low power cores being spun up and 4 of the same core type but optimized for higher clock cores spun up at the same time. If something requires low power, hand it to a low power core. If something puts a load on, spin up an A72 and hurry up and get idle.
The middle cluster is questionable. Or at most, maybe 2 low power tuned cores and 2 medium power cores.
Anyways, that’s all buying into big.LITTLE too, as the A7-9 series from Apple show two big meaty high IPC cores can be properly power gated for low idle and background processing power as well.