OPINION

What Do Feedback Loops For AI/ML Devices Really Show?

Optimization removes some of the baseline measurements for chips, making comparisons much more difficult.

February 9th, 2021 - By: Ed Sperling

AI/ML is being designed into an increasing number of chips and systems these days, but predicting how they will behave once they’re in the field is, at best, a good guess.

Typically, verification, validation, and testing of systems is done before devices reach the market, with an increasing amount of in-field data analysis for systems where reliability is potentially mission- or safety-critical. That can include cars, robots, military equipment, servers, and even smart phones and gaming systems. But the impact of intelligence on performance, power, and ultimately chip behavior is uncharted territory.

With semiconductors, predicting reliability is a combination of data analysis plus repeatability without failure — a confluence of math and science. The more that design through manufacturing processes are repeated with the same positive results, the higher the predicted reliability. It’s like comparing version 0.1 of a process to version 1.0.

The problem with AI/ML systems is that once they are released into the field, not everything is repeatable, greatly increasing the level of uncertainty in feedback loops. The whole point of these systems is use-case customization. They can adjust to changes in the environment or different user preferences. And unlike traditional chips, these devices are increasingly heterogeneous designs with unique architectures. Put simply, there is little history against which to measure reliability, and the data used in those measurements is suspect.

There are several possible solutions to this problem, none of which is perfect. The first is to spend more resources testing how software/algorithm updates will affect intelligent systems over time. Given the fact that many systems will have to be updated over extended lifetimes of a decade or more, OEMs need to understand how systems that already have adapted to their environment or different use cases will be affected by these updates.

In the past, vendors would roll out one patch after another, sometimes even multiple times a week, in order to fix interactions they didn’t anticipate. But with AI systems, there is no single baseline for updates. That means either patches will have to be much better understood and more carefully rolled out, or systems will require a partial reset every time a patch is downloaded in order to make sure everything works as planned.

Second, systems need to be architected so that whatever pieces can optimize themselves can only do so within boundaries that are acceptable. That means systems need to be designed not just for maximum performance and efficiency, but with carefully constructed pre-set limits. Those limits need to be well defined, because systems of systems may have multiple additive behaviors that can cause anything from erratic performance to uneven aging. Within a heterogeneous system, those kinds of changes are nearly impossible to keep track of, let alone account for.

Third, systems will need to run regular checks, whether that includes external monitors and data or a combination of internal and external sensors. Equally important, there need to be enough knobs to turn to make sure that when problems do arise, they can be identified quickly and fixed. You can’t do a hard reboot for the logic system in a car on a highway, but you certainly can add enough checks into a system to be able to isolate a potential problem and get off the road safely.

The tech industry has done an impressive job in developing systems that allow technology to take over important but tedious functions currently done by people. But it also needs to understand how to control those systems when something goes wrong, because as anyone with enough history in technology well knows, no electronic system lasts forever.

Ed Sperling

(all posts)
Ed Sperling is the editor in chief of Semiconductor Engineering.

What Do Feedback Loops For AI/ML Devices Really Show?

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

TSV Complexity Leads To Manufacturing Bottleneck

Sponsors

Recent Comments

About

Navigation

Connect With Us

What Do Feedback Loops For AI/ML Devices Really Show?

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

TSV Complexity Leads To Manufacturing Bottleneck

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored