New techniques, architectures and approaches are making up for a reduction in scaling benefits.
The slowdown in Moore’s Law is having a big impact on chips designed for the mobile market, where battery-powered devices need to still improve performance with lower power.
This hasn’t slowed down performance or power improvements, but it has forced chipmakers and systems companies to approach designs differently. And while feature shrinks will continue for the foreseeable future, they are being augmented with different architectures, materials and radically different approaches to computing.
“One effect of the technology improvements that have come with every iteration of manufacturing capabilities is power consumption of semiconductor devices,” said Gordon Allan, product manager for Questa Simulation at Mentor, a Siemens Business. “With the slowing of Moore’s law, the low power contribution per MOSFET transistor is no longer a dominant factor driving innovation. Soon, complexity requirements of the application will demand more low power from other sources. This is already evident in the industry.”
Several areas of this evolution already have begun, are reaching maturity, and will stand for several more years as a plan of record, as well as a requirement for participating in mobile phone and tablet markets as they continue to grow and evolve, he said.
“A slowing in Moore’s Law does not inhibit Arm and its partners from continuing pushing the leading edge,” said Stefan Rosinger, director of product management for the Client Line of Business at Arm. “Additionally, continued product and system enhancements for managing thermals—including managing sustained and peak power, as well as in-rush currents—are improving on an ongoing basis.”
Arm has been pushing power efficiency since the dawn of the mobile phone market, first in single-core implementations and later with heterogeneous multi-core architectures such as big.LITTLE. Adoption of new process technologies has allowed systems companies to stay on a steep performance trajectory. But with classical scaling gone and each new node paying smaller dividends in terms of power and performance, the focus has shifted beyond just scaling to a combination of factors ranging from more efficient software to different architectures.
“The additional transistors provided by the progression of Moore’s Law have enabled specialized cores to be designed and integrated into these processors, and helping to extend battery life,” said Steven Woo, distinguished inventor and vice president of systems and solutions, Rambus Labs. “Examples of some specialized cores include GPUs, vector engines, DSP cores, and cryptographic function accelerators. These specialized cores are not general-purpose. Instead, they implement one or a small set of functions in a highly optimized manner so as to be both area- and power-efficient for the task they are designed for. When they are not needed, they are essentially turned off, saving power. But when these functions are needed, the specialized cores are used, offering the best performance and power-efficiency due to their highly-specialized designs.”
Using the additional transistors provided by Moore’s Law to create specialized cores, and selectively turning them on and off as needed, requires less energy than executing the same functions on more general-purpose cores. That approach often improves performance, as well.
This doesn’t mean the end of process shrinks. Companies are still driving down the process node roadmap, but the benefits at each new node are diminishing.
“A lot of the large mobile guys are at 7nm, and they continue to look for the latest bleeding-edge processor,” said Ron Lowman, strategic marketing manager for IoT at Synopsys. “They’re still going with the latest and greatest LP-DDR interfaces. A few years ago people would say a mobile phone won’t need LP-DDR4, won’t need LP-DDR4x and clearly they need that. Next-generation interfaces are important.”
Another big challenge in mobile devices is the impact of power on heating within GPUs.
Say a game is played on a phone. The temperature rises. Eventually thermal mitigation kicks in and the processor automatically drops the clock speed. This is a common trick in processors; if temperature approaches a dangerous level, the device automatically reduces the clock speed to reduce power and therefore heating. Unfortunately, the game now runs slower. This clock-speed downgrade can go through several steps, so the longer the play, the slower the game runs (down to some limit). This is why thermal-constrained performance is becoming one of most important key performance indicators (KPIs) in mobile design.
According to a recently published white paper by ANSYS, Qualcomm Technologies took an ingenious approach to finding performance/watt inefficiencies in one of their GPUs. Rather than looking directly at redundant switching, which is common after the design team has done all they can to minimize leakage (through process selection and power islands), they compared the energy (power integrated over run-time) in the design with the equivalent measure for a slowed-down version of the design. They simulated slowing down by adding latencies, to mimic starvation or stalls for example, through scripted addition of NOOP operations in ALUs, modifications in the testbench or through other techniques.
One clever aspect of this analysis is that they are simply comparing the design with itself — no fixes have yet been made, so concerns over the accuracy of RTL mitigation are further reduced.
In a simplified view, this analysis leads to the energy diagrams shown in the Figure below. On the left is the ideal case. The unmodified test (the blue bar) runs more quickly at some average power level. The slowed down test (the yellow bar) runs for a greater time period thanks to added latencies, but at the same clock frequency. If the design is optimally clock-gated, average power will be lower because the same workload is running but over a longer time. Therefore total energy for the run should be the same as in the unmodified case.
However, if there are any gating inefficiencies in the design, the energy comparison will look more like the case on the right. Redundant toggles in the design will be active over a longer period in the modified run, and therefore the integrated energy in that run will be higher than in the original run. This is interesting both as a different way of looking at the impact of inefficiencies on the design and as an energy metric that is especially relevant in battery-powered applications like this.
Further, as the descent through the nodes slows, more consideration is being given toward finer-grain power optimization schemes and closer management of dynamic on-chip conditions, noted Stephen Crosher, CEO of Moortec. “By increasing the control resolution and response speeds of DVFS schemes and increasing the accuracy of underlying monitoring circuits, battery life can be extended. We see an increasing focus in these areas within the design community with more algorithmic optimization schemes being adopted.”
To address the high volume associated with consumer products, test engineers are seeking to maximize yield and the amount of useful die by normalizing performance of fast, slow typical silicon, enabled by increasingly sophisticated and distributed embedded process monitors being used within the core of the die, he said.
AI shakes up the mobile market
Part of what is keeping the mobile industry chasing the next node is how to adopt and accommodate for artificial intelligence capabilities, Lowman said. “They realize that they have to add neural network accelerators or enhance their processing on-chip somehow. When they do that, there’s more processing, which impacts power consumption. But if you don’t have efficient processors or efficient ways to manage those algorithms via multiple processors, because it is a heterogeneous processing problem, finding the right balance is difficult.”
Over the past 18 months, mobile processors have added neural network processors or capabilities, but the configurations of those processors are by no means standard.
“They’re all over the map,” he said. “Sometimes you’ll see one neural network processor and multiple GPUs, or an ISP (image signal processor), and the number of those are different configurations and different ways to do that.”
Also, the software compilers to handle AI features are incredibly complex because it’s a heterogeneous problem, and mobile chip architects try to leverage the best processing capabilities they have available to them on chip. The other piece is how those algorithms are compressed to work on the phone and the app. That creates huge challenges because the processing elements are trying to leverage more memory at the same time they are dealing with bottlenecks in those memories.
“If you have more memory on-chip, or high memory access off-chip, you can outperform competitors with respect to benchmarking,” Lowman noted. “It’s no different with respect to AI, and there are some relatively unknown AI benchmarks that make it clear that more memory on-chip or more memory bandwidth off-chip can improve the benchmarking in conjunction with the processing configuration they may have. It’s not just one processor. That’s the biggest challenge. The requirements for lots of memory and high-density memories is still pushing mobile chipsets to the latest process nodes, be it 12nm or 7nm, and looking to the future.”
SoC architectures
Diving down into the specifics of the SoC architecture revolution, multiple cores and cache-coherent architectures allow several variants of performance profiles for today’s applications:
“On-chip bus protocols and advanced cache algorithms have helped here,” said Mentor’s Allan. “They have been driven by complexity and performance/throughput demands but have considered power consumption as a primary requirement for some years now. There is less distinction between ‘mobile’ compute power and ‘server’ compute power in architectural terms now. Multi-cores are the norm in your pocket, just as they are in a server blade.”
The von Neumann architecture remains the CPU architecture of choice within the SoC. In contrast, most control, system and application software solutions use alternative approaches such as neuromorphic architectures, which offer power/performance improvements for predicting patterns in complex data and using relatively little energy to do so, said Allan.
“These applications are at the core of today’s value added mobile solutions, whether they involve speech recognition, face recognition, or assistive applications, in general,” he said. “They work for any app that requires the machine to adjust its behavior as it interacts with the world, and any app that is always on. This spans productivity, entertainment, health, or traveling from A to B with an assistant in your pocket or on your wrist. Emerging AI algorithms will be a mix of localized specialized processors in the device, and remote in the cloud or the edge of the cloud.”
The contribution of EDA
Alongside these new architectures and added features, the EDA industry appears to be keeping pace.
One area of focus for EDA is capturing and codifying the low power knowledge base of the 1980s and 1990s experts into the IEEE Standard 1801 UPF, which enables every mobile SoC designer with consistent high-quality, low-power constructs required as part of multiple clock domains, power domains and power/voltage islands on chip which are now commonplace.
The Wilson Research survey commissioned by Mentor found that in 2018, 71% of designs actively manage power, up from 59% a decade ago. That number is closer to 80% for large ASICs. This has turned out to be one of the more successful standards in terms of adoption.
“From our survey, we see that UPF2.x/3.x is now used by most designs,” Allan said, noting that EDA is providing a second wave of solutions and design flows for low power, not just verification flows. That includes analysis tools, which can suggest or even insert extra logic to reduce power on constructs that the tool identifies as ripe for optimization in the simulated application. The better the testbench activity, the more likely those tools are to make meaningful optimizations possible.
Future on-silicon monitoring can be predicted to gather usage data on power-hungry design constructs o that designers of subsequent iterations can have all the data they need to optimize, optimize. This includes all types of simulation, emulation and verification.
Another piece of the equation is the cloud, which is an increasingly important part of any system developers strategy for survival going forward. One shift in mobile electronics is client/server computing, an approach dating back to the 1980s in mainframes and minicomputers. Multiple generations of improvements in cellular networks and Wi-Fi networks have enabled client/server approaches, and it shows no sign of abating.
The 5G cellular network is an essential evolution here because it provides an energy efficiency benefit of 1,000X in bits per Joule of power. This continues to encourage applications on mobile phones and tablets to make use of cloud to provide part of their solution, and the device itself becomes a smart cache, with low-power storage, and a smart interactive device, with a low power display, audio and local processing. Meanwhile, massive compute and storage sit at the other end of a fast, low-power 5G connection. Anyone who has turned off their mobile device just to preserve the last drips of battery life will have less reason to do so next year with 5G on board their device, Allan said.
Synopsys’ Lowman agreed. “5G offers a whole new world of complexity. It’s not just the higher bandwidth. You’re not just upping more data. You’re basically instantiating multiple modems inside. It’s rapidly expanding, and it gets to be a very complex problem. In fact, there’s even 5G self-optimization, which could use AI to solve the 5G optimization problem. So it’s an increase in the compute, and these types of techniques are being used to lower power. In the same token, it really makes a much more complex processor, so the systems engineers are really earning their pay. Companies that are doing well here are the ones that are really going to understand that from a system level.”
Another technology area that has already adapted for low power, on the back of the high density requirements, is memory storage. In recent years there has been a shift from ‘active’ storage technologies such as DRAM and rotating magnetic storage, to ‘static’ technologies at scale such as Flash memory. NAND flash is used for highest density, and NOR flash can be used for a slight power consumption benefit, especially if the flash data is read-mostly. Flash is one part of the illusion of an ‘always powered up’ mobile device.
In semiconductor technology—there are other technologies waiting in the wings—new materials such as indium gallium arsenide and graphene, and new approaches such as quantum well, TFETs and silicon photonics. When the economics are right, the industry will pick the solution for the next challenges. Other technological advances include the miniaturization of discrete sensors, such as accelerometers and gyroscopic position/movement sensors.
Conclusion
The key question now is what the form factor of next-gen mobile devices will look like. Will they be the same?
“Probably not,” said Allan. “Low power is certainly a factor there, but not the only one. If the job of a mobile device—phone shaped or tablet shaped, is to receive input from us (voice and interaction) and present output to us (audio and visual), then that function can be achieved in future iterations of those products in different form factors from specialized component, including glass handheld or mounted displays, discrete body-worn or room-sensing microphones, smart glasses and other wearables, connected audio field devices around us in the room or car, and on us or in our ears. The markets for those devices will continue to grow, not decline, as they become more ubiquitous, even though the traditional ‘mobile’ and ‘tablet’ form factors may well decline.”
Leave a Reply