Reliability At 5nm And Below

What’s really holding back improvements in design and manufacturing.

popularity

The best way to figure out how a chip or package will age is to bake it in an oven, heat it in a pressure cooker, and stick it in a freezer.

Those are all standard methods to accelerate physical effects and the effects of aging, but it’s not clear they will continue working as chips shrink to 5nm and 3nm, or as they are included in multi-die packages. Extending any of those kitchen-like approaches to determine how chips will age over time will cause permanent damage in devices with increasingly thin wires and 50 atoms or less of dielectric insulation.

So far it’s not entirely clear what will replace those approaches, particularly as advanced-node chips are incorporated into AI systems in vehicles. Automated test equipment certainly can do basic checks for PVT and performance, and equipment does exist for deeper inspection and testing. The problem is that it takes more time to thoroughly inspect and test a complex chip developed at an advanced node, and particularly to find potential faults in a heterogeneous design. That, in turn, costs more money in terms of both fab time and equipment.

Ever since test costs spiraled out of control toward the end of the last millennium, the percentage of manufacturing costs reserved for test has been fixed. It’s not clear if that formula will change for manufacturing, but the cost of test is rising and spreading. Test no longer is confined to a single step in the manufacturing process.

In-circuit monitoring is now a given in large, complex chips, in part because they are so large and complex that standard testers can’t cover everything. But it’s also partly due to the fact that chips developed at advanced nodes are being used for longer periods of time — and more intensively over their lifetime — than most advanced-node chips in the past. And it doesn’t help that many those designs are semi-customized for specific algorithms, so defects and latent defects may be very different one design to the next.

In addition, the reliability requirements for some of those chips are increasing. Reliability always has been a risk equation, and that risk was low when the leading-edge process was in the double-digit nanometer range. Those chips could be subject to rather crude testing methods using precise measurements of time, pressure and temperature. That’s no longer possible at 7/5/3nm. And as chips are designed at those process nodes for safety-critical and mission-critical applications, the risk of failure is proportional to an increase in liability. It’s one thing to have a smart phone fail. It’s quite another to have the guidance system on a large truck fail.

And this leads to one of the thorniest issues in the chip world. The best way to improve reliability is to share manufacturing data across the supply chain, from the field back to the fab and out to chip architects, EDA vendors, and test and analytics companies. Fabs are certainly more open with their partners and customers than in the past, but they need to go much further.

This is potentially competitive information, and no foundry wants to part with enough data to make this easy. But times are changing. Each new node adds new issues that are more difficult to detect and solve, and those issues are expanding into new markets. So while keeping data in-house may seem like a good business decision, it may have serious long-term repercussions for the entire industry.



1 comments

Joseph Fjelstad says:

Thanks for highlighting this topic. The continuing effort to shrink transistor size seems to be on autopilot with no one manning the cockpit and no one in the air traffic control tower.

Your questions need to be fully considered, discussed and debated publicly. Your comparison between smart phone failure and an autonomous vehicle failure is something that too few seem to be actively considering.

Thanks again

Leave a Reply


(Note: This name will be displayed publicly)