While AI can speed up chips, it’s not always obvious where and for how long.
AI/ML is creeping into everything these days. There are AI chips, and there are chips that include elements of AI, particularly for inferencing. The big question is how well they will affect performance and power, and the answer isn’t obvious.
There are two main phases of AI, the training and the inferencing. Almost all training is done in the cloud using extremely large data sets. In fact, the larger the data set the better, assuming the data is relevant. AI training is of limited or no value with small data sets because it cannot draw relationships and identify patterns across that data.
Inferencing is another matter entirely. Once the data is trained, the algorithm can be reduced to something that will work in a much smaller device. But answers can be sort of correct rather than precise, and there is a constant balancing act to get good enough results faster, which is why there is so much manipulation and updating of algorithms. Minor changes in training algorithms can produce significant changes in what gets inferred and how quickly, so any modifications can produce very different results.
Computer scientists have become quite adept at that kind of algorithm modification over the past few years, but there are still some big unknowns in this technology. Precision is still a lot of guesswork, and it has been for years. This isn’t a new challenge. A decade ago UC Berkeley developed a swarm lab of semi-accurate devices modeled on the human brain, where lower precision of a lot of devices produced accurate enough conclusions much faster than using precision computing.
How this applies to inference, particularly in safety-critical applications, remains to be seen. But precision remains a key part of this equation, and less precision across more compute elements using sparser algorithms can greatly speed up results with far less energy. The downside is that it’s difficult to tell what’s happening today with algorithms once they are deployed in these systems, and almost impossible to fix once they go wrong. These are, for the most part, black boxes.
On top of that, there are regular updates. Software patches slow devices over time, requiring more energy to achieve the same results. What is needed here is a way of cleaning off whatever was replaced with a patch, and that is difficult because these algorithms adapt to various use cases. The big challenge for AI/ML technology is determining whether they are adapting to the right use cases and optimizing for that behavior, versus behavior that is less acceptable. So far, there are few definitive answers on this subject, although it does surface periodically as a topic of concern at conferences.
And finally, AI/ML are parts of systems, not just chips. Those chips have to work in conjunction with the rest of these systems, which means that rather than just optimizing a chip, the entire system needs to be optimized for whatever is critical to infer. The emphasis here is on what really needs to be inferred, because just placing AI into a system without understanding the impact doesn’t improve performance. In fact, it can have the opposite result.
AI has enormous potential. It can prioritize operations in a complex system and it can help to identify bottlenecks and figure out ways around them. But it also can produce undesired results with unintended consequences. All of this needs to be considered at the architectural or conceptual level, just like system power, and it needs to be simulated and tested for corner cases using techniques such as digital twins and whatever other techniques are relevant. This is especially important because at this point there are few tools, no obvious right answers, and a lot of potential interactions that may not show up for months or years.
Leave a Reply