Rollout of artificial intelligence has created a whole new set of challenges, along with a dizzying array of innovative options and tradeoffs.
The increasing adoption of AI in edge devices, coupled with a growing demand for new features, is forcing chipmakers to rethink when and where data gets processed, what kind of processors to use, and how to build enough flexibility into systems to span multiple markets.
Unlike in the cloud, where the solution generally involves nearly unlimited resources, computing at the edge has sharp constraints due to power, area, and cost. Nevertheless, it’s also cheaper, faster, and safer than sending everything to the cloud, and with the rollout of AI, the edge has opened the door to massive innovation and new opportunities around both inferencing and some limited training.
“AI is moving from the cloud into edge devices,” said Thomas Rosteck, president of Infineon’s Connected Secure Systems Division, in a recent presentation. “With this intelligence, we can save power because getting the data and then transporting it to the cloud, calculating it in the cloud, and getting it back consumes an awful lot of power. That is something we can solve with Edge AI, and with this we also contribute to decarbonization. And if I don’t transport it, if I have it local, it’s more secure.”
The edge encompasses a broad range of systems, from mobile devices tethered to a single battery, to on-premise data centers. Whatever the architecture, these devices share a common need to process, store, and move an increasing amount of data at a rate consistent with the application. And they all need to perform these functions consistently and reliably, regardless of processor and memory utilization, and physical effects such as noise, heat, or vibration.
Mass-market mobile devices such as smart phones have been dealing with these kinds of issues for years due to the phone’s form factor limitations, using a mix of processing elements and sophisticated thermal management to avoid burning a user’s hand or face. That can include everything from checker-boarding which transistors are active to reduce dynamic heat density, adding heat sinks and thermal monitors in appropriate places, and utilizing different types of processors, including some developed at the most advanced nodes.
Smart phone vendors can accept those costs because they can be amortized across hundreds of millions of units. The business context is very different, however, for many other IoT/edge devices, which are price-sensitive and sold in much smaller quantities. For those products, vendors typically rely on off-the-shelf commercially available components such as MCUs and DSPs, many of which are undergoing continual changes to deal with increasing compute demands. In some cases, device makers also are combining generic processors with more targeted, semi-custom accelerators, which can improve performance where it counts, and therefore limit the amount of energy required to perform specific calculations. And they are using all of these components in new ways and combinations as AI models begin showing up nearly everywhere.
“We’re in phase two of AI today, which is why you’re seeing things like AI hubs,” said John Weil, vice president and general manager of the Edge AI Processor Business at Synaptics. “For example, my old security camera can identify people walking on the street. But now, instead of a camera doing that, it’s being done in a centralized box with no cloud connection. In the past, we needed the cloud to do that. Now we can do it in a very low-cost product. The third phase will include new product definition, which will create new products that change our lives. The market is starting to define new categories of products that weren’t there before AI.”
This goes well beyond just more efficient processing. It affects how those processors are utilized within and between devices. In many cases, it involves rethinking system architectures and the sharing of compute resources. Similarly, noise filtering on a set-top box can be done differently than in the past, using a very different design approach.
“Let’s take the case of a far-field voice communication, where you have a device you’re talking to like an Alexa or Google Home or a set-top box,” said Prakash Madhvapathy, director of product marketing for Tensilica audio/voice DSPs at Cadence. “Some people will take the voice input and do filtering to reduce the noise first, and after they reduce the noise they will process it through a signal crossing line. That’s one way to do it. Other people will take in the noisy signal as is, train the device with a noisy signal and clear signal to make sure that the training AI interprets the noisy data just like the clean data. So with this, the noise is part of the actual training data, and it can then interpret noise and the signal separately and filter them out. If it can infer the same things that you can with clean data by appropriate training, you would achieve a very close result.”
It also potentially tightens the relationship between the EDA tools and the devices created with them, using AI as a bridge to open up new options. “We’ve had the benefit of more than three generations of Von Neumann architecture driving EDA,” said Ravi Subramanian, general manager of the Systems Design Group at Synopsys. “That learning has really matured, and we have refined the tools to get to where we are. But now a whole new learning has to happen, and that learning can happen with AI. You can actually build models today, and that opens up a tremendous opportunity based on how our learning evolved the tools when AI wasn’t around. These are the things we are directly dealing with our customers. And you have to think about where to apply this.”
Thermal issues
Data centers and smart phones have been wrestling with heat dissipation and energy efficiency for decades, and those become more challenging as the amount of data continues to grow. But at the edge, in many internet of things applications, processing demands historically have been low and heat has been less of an issue.
Consider Bluetooth devices, for example. “We haven’t seen that being a major concern, mainly because these small IoT devices are low power by and large,” said Marc Swinnen, director of product marketing at Ansys. “They work off batteries, or sometimes energy harvesting, so the designs tend to be very low power, which means that thermal is typically not a huge issue. Also, the power produced by the chip is proportional to its surface area, not its volume, and as you make the chip smaller, the power scales down with the surface area, so the cooling is also proportional to the surface area. If you make the chip half as big, it only has half as much surface area, but also only has half as much power producing surface area. The two tend to scale in parallel, unlike volume effects.”
Those types of chips will continue to be used in devices, but as AI is included in more devices and systems, more capable processors are being added in, as well.
“There are some chips that are burning so little power that they’re not going to really contribute to the heating of the system,” said Scott Best, technical director at Rambus. “But there’s usually something in the system that is going to cause things to overheat. Every chip suffers from this local heating problem. Something inside of the chip or in the system is generating heat, and it’s self-heating everything around it.”
The rapid uptake of medical devices and monitors is hyper-aware of these issues. Depending on what the device is measuring or detecting, there may be a full spectrum of processing needs and thermal concerns. What’s changed is there are more options for addressing those concerns today than in the past.
“Even if you stay with today’s technology in terms of battery capacity and so on, when you’re monitoring the health of a human being — whether it’s embedded in the kidney or some other organ inside the body, or whether you’re doing that from the surface — you don’t have to monitor it consistently with zero gaps in time,” said Cadence’s Madhvapathy. “If you’re able to do it once an hour or once every two hours, it will last longer.”
Much of this is application-dependent, of course. But the overall trend is more data processing everywhere, and that has made thermal dissipation a concern not just for device health, but for user safety and comfort.
“Whether it’s editing yourself into a photo on your phone or your watch checking your heart rate, sleep quality, or blood oxygen levels, IoT devices play a prevalent role in our daily lives,” said John Parry, Simcenter Industry director for Electronics & Semiconductor at Siemens EDA. “They consume battery power and, as a byproduct, produce heat. Getting rid of that heat is important, but also a real challenge. With wearables, the main heat flow path is conduction into the skin. Skin surface temperature has to be kept below 45° C. Otherwise, there is a risk of a low-temperature burn. Phones are challenging because the conduction from the case can’t be guaranteed. Users often use a protective cover to guard against accidental damage that impedes heat loss.”
Solutions vary
There is a solution for nearly every problem in design, and if there isn’t one, smart engineers likely will develop one. The bigger challenge, though, is understanding how one or multiple chips will be used, and having enough flexibility in a design to be able to adapt it as needed.
“We have an MCU and we have a co-processor for running the AI portion of a chip,” said Infineon’s Rosteck. “We also have a GPU inside of this chip. So we are very flexible in what we can do there. And you don’t just run language models on the core itself. You have accelerators for that. The second step is helping our customers get to their models. This is the acquisition of Imagimob that we did last year, so now we have a tool chain, and this tool chain has a benefit. Models can be designed either by experts in the field, and you can combine it with AI experts to go as deep as you want. In the end, it will be translated into code that is then executable on our macros.”
Solutions can vary greatly, and this is particularly evident with thermal management. Heat, for example, can be dissipated across the surface of a device or chip, but that becomes harder when the chips are small and, in advanced packages, when the substrate is thinned out to shorten the distance that signals need to travel vertically.
“Designers need to maximize the effectiveness of the surface area they have available, so to optimize cooling, an effective strategy is to spread the heat,” Parry said. “That means using ultra-thin vapor chambers to spread the heat within the device away from the main heat-generating components to reduce hot spots on the device’s surface. A uniformly warm surface maximizes the effectiveness of the available surface while minimizing hot spots.”
Vapor chambers are not a new idea, but they were not successful in the past because the target devices were mobile, and movement sharply reduced its effectiveness, according to industry sources.
A number of other strategies can be implemented to ensure excess heat doesn’t affect performance or put users at risk, as well. This is particularly important for use cases where tight thermal limits are still in play.
“One possible approach is to tally up the worst-case sustained power dissipation ratings of the various components and SoC subsystem IP blocks and verify that if every major system is on and working at full throttle, the sum of the parts doesn’t exceed the thermal rating of the full device,” said Steve Roddy, chief marketing officer at Quadric. “This can be done without extensive design analysis or simulation simply by inspection of the spec sheets of the various component manufacturers or ratings from the IP block suppliers, and applying some common-sense rules of thumb about which systems are likely to be ‘on’ at the same time. While this method may be quick and require low investment of engineering time, it has the drawback of likely overstating the actual active power scenarios, potentially sacrificing peak performance or reducing functionality in service of the thermal design goal.”
A second, newer, method, relies on creating full-chip and full-system digital twins, featuring models in which the known or expected dissipation of power in each mode is included.
“The various EDA companies have been touting their latest tools to help facilitate this shift left approach that allows designers to model virtual systems and run actual embedded software code long before committing to chip or board design masks,” Roddy said. “If you run actual code on an accurate system model, the real behavior can be modeled at a much finer level to determine answers to questions, such as, ‘What is the actual off-chip I/O traffic on this DDR interface, and thus how much actual power is being dissipated? And is System A really ‘on’ simultaneously with System B, or can I time multiplex them with tuned software to achieve lower peak power dissipation?’”
Using this approach can be helpful in predicting system power and allowing maximum performance within a design constraint.
While controlling heat can firstly be accomplished by controlling voltage, Ansys’ Swinnen said more complicated strategies can be adopted, particularly through sophisticated forms of clock gating. But such methods require special care, can be labor intensive, and need to be considered from the very beginning of the design process.
“It’s not just big blocks, but any small block that can be switched off,” he said. “They have very complicated clock structures to enable that. In the design phase, too, when you look at typical clock-gating tools like (Synopsys’) Power Compiler, they put a gate ahead of a handful of flip flops, and then every handful of flip flops that can be switched off gets its own gate. But in fact, all those gates could be unified into one single gate closer to the root of the tree, rather than switching off that group of five, and this group of five, and this group of six. Instead, you make one gate that switches them all off. The problem is you have to be careful in your design, because the enable signal for that clock gate to switch it ‘on’ and ‘off’ has to go higher and higher up the tree and make the timing tighter and tighter.
That, however, comes with its own downsides, as the delays between the clock switching and the signal itself can become increasingly large. That places limits on how far a designer can push the clock gating, necessitating tradeoffs between timing and power efficiency. Swinnen noted that while solutions that can be implemented to maximize power efficiency, these clocks need to be configured manually, at either the gate or RTL level. This approach is a headache for designers, and it can complicate power analysis due to the additional combination of nodes that require testing. “You can save power, but you need to have someone crack their knuckles and really tweak the design and make sure the timing is right, whereas with an automatic system, I just put 10 clock gates in. That will work, too, but it won’t be quite as efficient.”
Quadric’s Roddy noted that some modern tools enable designers to calculate power profiles based on real code, rather than just measuring total cycle counts. “A legacy convolutional neural network heavy with large 5 x 5 and 3 x 3 convolutions will have more gates toggling for longer periods than a more modern transformer that leans more heavily on activations and normalizations and shape transformations,” he said.
Some solutions that might seem obvious come with downsides, too. Swinnen pointed to hearing aids as an example, where an overheated chip could cause major issues. In such a case, he said, safety measures need to be built in, such as embedding thermal sensors that can throttle back the clock in case of anomalous signatures. “It slows the performance, so you won’t meet nominal performance, but your temperature will stay within a certain limit. That’s a bit of a brute force approach to it. You don’t really solve the problem. You just address the symptom.”
Swinnen noted that the problem of cooling IoT devices is likely to become more important in coming years, as some companies have announced a desire to move more AI computing from data centers to the edge. That could force designers to give it more consideration than it currently receives. “Power has always been a soft failure point, not hard,” he said. “If you don’t meet performance or DRC, if your shapes are too close together for foundry, those are hard errors. You cannot tape out with those, and they will hold the tape out and redesign until they fix those things. But power is seen more as a soft requirement.”
Fundamental choices
Underlying much of this effort are questions about what is the most efficient and cost-effective approach, minimizing data movement wherever possible — an increasingly important consideration with AI and increasing amounts of data — and optimizing the processing and management of that data wherever it is done.
“It’s really about understanding the data movement requirements of the application,” said Steven Woo, fellow and distinguished inventor at Rambus. “The question is what the right way to do this and for which applications. There’s no one application that’s going to see a benefit. There are lots of them, and the way you might optimize them are different. Everybody understands this is a big problem.”
The key in all of this is understanding what works best where. “We spend a lot of time helping customers figure out how to not over-engineer the problem,” said Synaptics’ Weil. “When we say ‘AI native,’ you don’t need all of the extra horsepower because you’re only using it in research mode. So you certainly can use an NVIDIA Jetson. It’s beautiful, it’s two or three times the price, and it has lots of horsepower. But when you want to ship a million units, you’re not going to use that. We spend a lot of time helping them look at the more optimized scenario.”
Related Reading
Heat-Related Issues Impact Reliability In Advanced IC Designs
Retaining data in memories and processors becomes more difficult as temperatures rise in advanced packages and under heavy workloads.
Edge Devices Require New Security Approaches
More attack points and more valuable data are driving new approaches and regulations.
Leave a Reply