Connected Reliability Concerns

How a device behaves over time will be affected by other devices that were never considered.

popularity

Ever since the invention of the integrated circuit, the focus has been on improving technology—making it faster, smaller, cheaper, while also cutting the power budget. With the advent of the IoT and ubiquitous connectivity, the value proposition will change.

Rather than just improving the chip, the focus will shift to how that chip behaves in context. How does it work in a connected world? What else can be done with it, particularly in conjunction with other devices?

This is not a trivial change, and every change of this scale raises some serious issues. Security is an obvious one, and no matter how many dire warnings are issued or how many holes are plugged, there will be breaches in places no one thought about.

Far lower on the threat meter, but one that nonetheless will require serious attention, is connected reliability. This is almost like the known good die problem in multi-chip packages raised up a few orders of magnitude. If a device is known to work on its own, how will it behave in a highly connected environment? And if another component in the connected chain fails, can the device recover—and how quickly?

This will quickly become more than just a theoretical problem. Everyone has experienced dropped calls when driving through areas where there are no signals. But with a device that needs to be continuously connected, signal loss can impact safety-critical equipment. While it may not cause a device failure, reliability will extend to such things as time to reboot and time to reconnect, update and respond.

Failovers are nothing new in medical devices. A pacemaker, for example, will continue to function beyond its battery life, and it has a built-in defribrillator that is good for at least one duty cycle. But that kind of reliability has never been thought about for a wide range of devices that will become part of the communication chain, from edge devices and edge-of-the-network servers all the way up to the cloud.

This is a different mindset, and it will require tools vendors literally to think outside the box when they are creating next-generation devices with a series of “what if” scenarios. For example, what are the power and circuitry implications over time if another device cannot keep up with communication speeds or quality of service? Some of these devices will be required to last for 10 or 15 years, but they will be built without any understanding of the other devices to which they may have to connect.

This kind of verification, post-silicon validation and testing has been done in some markets in the past, but most of the work has been highly customized. In the future, it will have to be done on a mass scale, which implies at least some standardization. Who will drive this effort? And how quickly will the market adopt these kinds of technologies if they are productized?

This is a different way of thinking about semiconductor design, but it will become essential at some point in the not-too-distant future. It’s time to start the discussion about what needs to change and how those changes will impact the entire semiconductor ecosystem.

Related Stories
Improving Transistor Reliability
Some progress is being made, but there are no easy answers.
Are Chips Getting More Reliable?
Maybe, but metrics are murky for new designs and new technology, and there are more unknowns than ever.
Reliability Adds Risk Over Time
Having devices last longer isn’t necessarily a good thing.