There’s no longer survey data on respins, but there are certainly a lot of new problems to contend with—and some new ways of looking at those problems.
The industry used to have survey data that showed the number of respins required for a broad swath of designs and the principle causes of those respins. That was a good indicator of where tools or processes needed to be improved. At the time, the data showed that the primary cause of respins was functional errors, and since then EDA vendors have been beefing up tools in that area.
Most of the available data is now 10 years old and new data does not appear to be forthcoming. Perhaps the data is so valuable that it doesn’t get circulated, or perhaps nobody is willing to admit their chips are not perfect the first time around, even though surveys are anonymous.
Despite the lack of data, there is a steady stream of new problems that did not exist in the past, each of which could cause a respin. The added complexity of chips today, when compared against those from just a decade ago, would seem to suggest that there would be additional problems associated with concurrency and at the other end of the scale with the growing issues surrounding timing, signal and power integrity plus a host of other problems. It makes it tough to believe that chips do not have problems and that respins are not a somewhat common event.
In addition, achieving economic yield appears to be taking an increasing amount of time for each new node. During this yield ramp period, additional design rules and restrictions are found that may require design changes. While it could be claimed that these should not be called design errors, they are still a major cause of respins.
But if you ask people if their chips work first time, you will have a hard time finding someone who will admit to a respin or two. Is this just corporate culture or is something else going on?
“I am not sure a lot of designs are really working first pass,” says Bernard Murphy, chief technology officer at Atrenta. “A lot of failures at this stage are mixed-signal. These errors can and do lead to dead-on-arrival devices on first silicon. Between these issues and the realities of power and possibly performance tuning, I suspect that multiple silicon passes are more common than is advertised.”
There are even questions about whether first-pass silicon is a relevant term anymore.
“Working first pass is not the parameter which design groups are being judged on,” says Jon McDonald, technical marketing engineer for the design and creation business at Mentor Graphics. “Increasingly we are seeing the target being saleable, which defines a successful device. To be saleable the system must perform with acceptable performance and power consumption at a cost appropriate for the target market.”
Sometimes that comes in the forms of caveats.
“Chips are often delivered with significant errata of ‘what not to do with them,’” says Frank Schirrmeister, group director for product marketing of the System Development Suite at Cadence. “Verification was, is and always will be an unbound problem. There is always more to verify. Schedule often drives the actual delivery of the chip and teams are thinking about confidence levels, i.e. when they can be ‘confident enough’ to release the chip for tape out.”
Some companies plan a certain number of chip respins. They use shuttles to tape out the chip before they actually expect to ship a product. This enables them to do post-silicon verification, create development platforms for software bring-up, and to find out where they need to concentrate further efforts. The fact that these respins are planned means they are not treated as failures.
Rethinking the flow
EDA companies hope they can replace these post-silicon efforts with pre-silicon virtual prototypes. “Some design issues are uncovered after the hardware has been implemented but can be worked around in software,” explains Angela Sutton, product marketing manager for FPGA Implementation at Synopsys. “Not all can be fixed without compromising the required system performance or operation. This is why it is critical to validate system software running on a prototype of the hardware prior to implementing an ASIC/SoC.”
“The high cost and product schedule impact on a chip turn mean that design teams will invest a huge amount of effort trying to find a software workaround for every hardware bug found post-silicon,” says Thomas Anderson, vice president of marketing at Breker Verification Systems”
“The ability to work around some of the issues with software is an important one,” adds Schirrmeister,
“and in some application domains hardware abstraction layers (HALs) allows ‘shielding’ weaknesses in the hardware by not allowing certain low level software calls to be made.”
Where does the burden lie for achieving good-enough performance? “Hardware development and verification teams are tasked with making sure the RTL functionality matches the spec, while software development and verification teams are tasked with validating whether the SoC (hardware plus software) is doing what it was intended to do,” says Michael Sanie, senior director of verification marketing at Synopsys. “These two tasks are slightly different.”
Mentor’s McDonald agrees: “A chip that is working but misses a critical market capability will not be a successful device. A successful device may have software modifications to deal with hardware issues that cannot be changed, but it is the resulting effectiveness of the system, hardware and software, which enable the device to be successful.”
With increasing amounts of important functionality going into the software, the burden on the hardware is quickly becoming one of providing a suitable platform for the execution of software. Each chip will have one or more areas of new hardware that provide new capabilities, add to performance or lower power. Almost everyone claims that performance is still the most important design criteria, but one has to ask the question, “When was the last time that you bought a product where the decision was based on the clock speed of the chip?” It only has to have good enough performance as compared against the other products in the market. This may be another change as we migrate away from the desktop computing model, and it may mean that an increasing number of ‘good enough’ chips are making it out without a respin.