How AI/ML systems will age is not well understood, and that’s a big problem.
The rush to utilize AI/ML in nearly everything and everywhere raises some serious questions about how all of this technology will evolve, age and perform over time.
AI is very useful at doing certain tasks, notably finding patterns and relationships in broad data sets that are well beyond the capabilities of the human mind. This is very valuable for adding efficiency into processes of all sorts, from autonomous driving to predictive analytics in the home, industrial operation, or even a smart city. But it also assumes that systems built today will continue to perform as expected for years to come, and that’s not obvious at this point.
There are several issues that need to be addressed. First, algorithms as they are written today are opaque. This affects both training, where weights are applied for certain behaviors, as well as inferencing, where systems interpret those training algorithms within a distribution of acceptable responses. The problem is that when aberrations show up in those responses, it’s very difficult to tell what went wrong. While it is possible to recreate the series of events that caused the error, no one can say exactly why something happened when it did.
This becomes more complex as algorithms begin training other algorithms, and as those algorithms are in turn tuned for specific hardware interactions. Hardware itself can drift. This is as true in advanced digital designs as in analog or even photonics, and that can skew interactions with algorithms that have been tightly co-designed with that hardware. The results can be both unexpected and potentially unfixable.
Second, many of these algorithms and AI/ML systems are being developed in isolation. Divide and conquer has worked for years in complex chip design. But in a complex system of systems, such as a car, there will be multiple systems working together and separately, and they will all evolve uniquely and change over time. This would be the perfect argument for a system tune-up, which is why new cars have maintenance schedules. Unfortunately, today no one can track the behavior changes of all of the algorithms, not to mention the quality and accuracy of the sensors providing new data.
One way around this is regular changeouts of electronic modules in order to reduce problems in the field. But if modules are going to be replaced, how do you determine the frequency of those replacements? As these systems adapt to use models, no two systems evolve in exactly the same way. And if they are replaced, you need to ensure that other modules have not optimized over time to old modules in ways that could affect performance and accuracy in the future?
Third, software updates are more complex when it comes to AI algorithms because those algorithms are used to define the hardware. If the algorithm changes, or there are layers of code that never get cleaned out, then the hardware runs slower and may not be able to take advantage of all of the new features in the algorithm. We see this today with computers, where they run slower over time due to software bloat from layers of updates. Imagine what happens with layers of AI/ML algorithm updates.
What this means for safety- and mission-critical systems is anyone’s guess. And while there is a big push for long-term reliability of these systems, that may be the wrong way of looking at the problems. Reliability is no longer a measurement that can be assigned to an individual component in an AI system. It is about the behavior of a system of systems, and when it comes to AI no one has any idea how those systems will behave in a decade or more of use in a variety of different environments.
Related Stories
How Hardware Can Bias AI Data
Degrading sensors and other devices can skew AI data in ways that are difficult to discern.
Why Data Is So Difficult To Protect In AI Chips
AI systems are designed to move data through at high speed, not limit access. That creates a security risk.
Dirty Data: Is The Sensor Malfunctioning?
Why sensor data needs to be cleaned, and why that has broad implications for every aspect of system design.
AI Knowledge Center
Top stories, special reports, videos, blogs and white papers about AI
AI Training Chips
How to speed up algorithms and improve performance.
For videos about AI, please check out our AI/ML/DL playlist on our YouTube video channel here
Problem has multiple stages. Firstly it’s inadequate hardware design approach which ain’t flexible at all & many chose this path in order to save on silicon area and say how their very own design is more efficient/performance oriented. For training, thanks to fast evolving algorithms, only FPGA’s are really suitable & thankfully they are getting better/faster. For execution, that would be a semi-programmable DSP, not anything with hard fixed logic. The second part of the problem is more of a philosophical logical nature. Even when algorithms are developed, fully trained and categories formed, there will be so called anomalies (things that simply don’t comply but yet they are there) those simply won’t be able to process automatically (at least not until declared and defined as exceptions).