Simply improving the failure rate is not enough for today’s electronics requirements.
Over the past few years, the reliability and safety of electronic systems have become increasingly more important. Ongoing digitalization has made these systems an integral part of many items of daily use. This means that failures are becoming more and more critical as they can cause considerable disruption – even when they occur in consumer electronics. Today, electronics are also indispensable for safety-critical applications such as in the automotive sector. Here, a failure can possibly even lead to loss of human life. At the same time, the omnipresence of electronics is driving up cost pressure, while the available space is shrinking. The past approach to reliability, in which redundancy and design for higher stress as actually expected were used, will therefore only be feasible for a small number of safety-critical applications.
Naturally, there is not a one-size fits all best solution for every use case. It therefore makes sense to take varying approaches – in particular for two scenarios: Is the decisive factor in pursuing high reliability the cost of the electronic system itself? Or is it that the consequences of a failure, besides the obvious monetary follow-up costs, can also include damage to people and the environment? In the following parts of this article we will address the second scenario, as it comes up far more often.
Looking at a failure in terms of the follow-up costs, it is obvious that simply considering the failure rate alone is not productive. Redundancy can ensure that, after a failure, a system can still continue to perform its function until it can be repaired. As mentioned above, this extra cost is generally considered warranted only for safety-critical applications. If the priority is to minimize the follow-up costs of a failure, it is smarter to repair or replace the electronic system shortly before the failure. This process is called failure prediction, with the most direct method being damage detection. The concept of damage detection has been around for a very long time and is used in all areas – including electronics – as a matter of course in manual inspections. One simple example is a bridge: if cracks appear in the brickwork, the bridge will generally be able to continue to perform its function for a time. However, repairs should be carried out before the bridge collapses.
This process becomes more challenging in electronic systems. First, it has to be made clear what kind of growing mechanical damage leads to which type of functional failure (failure mode). This is known as the failure mechanism. However, it is rarely possible to establish an unambiguous mapping, and the same failure mechanisms can trigger different failure modes. At the same time, one particular failure mode can be caused by a variety of failure mechanisms. In addition, it must be determined how fast a given failure mechanism is progressing toward a given failure mode. Today, this is known for the majority of electronic technologies and for most of the failure modes and mechanisms. The priority now is to find methods of detecting damage. These must be automated and must require an absolute minimum of additional components and sensors. Ideally, existing functional components can be made to serve as sensors. For example, it is possible to use a body diode to measure thermal impedance and detect cracks in the IC packaging, or to use time domain reflectometry (TDR) to detect cracks in cables with the help of relay switching pulses. Finally, reliable automated data analysis and user notifications must be in place.
At first, these requirements make widespread rollout of damage detection appear unlikely. But the high requirements for damage detection are balanced by its significant benefits. As indicated above, simply improving the failure rate does not match up to today’s requirements for electronics and will not counteract the consequences of a failure. Redundancy is frequently difficult to implement due to space and weight constraints as well as expensive. Failure predictions made using model-based predictions is usually too imprecise. Damage detection thus closes a gap. It offers relatively simple and inexpensive failure predictions, which tend to provide advanced warning only on short notice but are very accurate. This explains both why the use of damage detection has grown steadily over the past few years and why this growth is accelerating. New standards currently being discussed by several standards organizations will also create the conditions for even broader adoption in the years ahead.
Leave a Reply