While technology continues to advance, the supply chain is moving much more slowly.
For all the promise and subsequent anxiety about moving to the next process node or stacking die, the real problem isn’t technology. It isn’t even cost per transistor. It’s who will take responsibility when something goes wrong.
Notice the word “when” rather than “if.” Rising complexity means that chips no longer can be fully verified, so errors are a given. Some errors are worse than others, of course. In a missile guidance system, an error is to be avoided at all costs. In a smart phone camera, a malfunction is frequently forgotten when a service contract runs out.
But as the semiconductor world increasingly moves to systems on chip rather than a collection of independent chips, it will be harder to pinpoint the cause of an error. And at some point there will be critical errors that cannot be fixed with software, and there won’t be enough margin left in designs—margin costs money, takes away from performance and impacts energy efficiency at advanced nodes—to smooth over the problems.
So who takes responsibility for a chip that costs several hundred million dollars to create but no longer works? Is it the OEM? The systems integrator? The foundry? Or perhaps the IP vendor, whose IP seemed to work fine in some chips but which causes a problem because it isn’t fully characterized for a new configuration?
This has been one of the big issues raised by companies considering 3D-ICs, where stacking could create proximity effects that were never considered. But it’s also a problem in increasingly complex chips where those same kinds of effects could result in similar problems. A chip with several hundred million transistors and scores of IP blocks, multiple processors, I/O schemes and voltage islands is bound to fail somewhere.
While most engineers know not to put a noise-sensitive analog blog next to a very loud SerDes block, they may have no insight into RC delay at 14nm from the interconnect or shrinking wires. It’s also difficult to predict where electromigration will occur, even with the best extraction tools, or when fill around a transistor isn’t completely consistent and causes a transistor to malfunction. And in a very complex chip, it’s hard to even effectively map the signal path and guarantee it will work properly.
Add in multipatterning, stress effects, process variability and it’s imperative that all contributors to an SoC design—planar or stacked—begin dialog about sharing responsibility and information that can prevent problems in the first place. Communication is a first step. Understanding issues on all sides is next, based on experience of test chips. And putting in place working agreements that will allow companies to jointly diagnose and solve problems is the third piece.
Some of these agreements already exist, such as those between foundries and their closest partners. But for the industry to advance, there will need to be many more agreements hammered out. It’s not the technology that will slow down progress in this market. It’s the lawyers and the bankers. And until they no longer need to get involved, everyone will move much more slowly and carefully than progress demands.
Leave a Reply