The Price Of Fear

Risk and fear go hand in hand within the semiconductor industry. Finding ways to reduce them is a balance against time and cost.

popularity

In my last blog, I talked about how pain is important when making predictions in the semiconductor industry. Pain is related to time to market and risk, and the flip side of risk is fear. Fear is one of the main drivers for a large number of EDA tools, such as those related to verification.

The fear is taping out a chip, then waiting for what seems like an eternity to get the first chips back, only to find they don’t work. Precious time to market is lost whilst diagnosing the problem, fixing it, reverifying it, probably running many other tests again, then getting back in the queue to have another batch of chips made. The more advanced the node, the more potential issues you must be aware of. Many of those still may not be fully understood. If we start to add new issues associated with 2.5D or 3D packaging technology, then the number of new problems rises quickly.

Advanced nodes use more masks, which are expensive, and fab capacity for those is at a premium. It often is reported that mask sets run in the millions of dollars, and that may be a very conservative number for many designs. Maybe the costs are lower for more mature nodes, making this less of an issue. In fact, if we go back enough years, it was not uncommon for companies to report three or four re-spins. I know of one company that often had seven re-spins to get it right.

Could a few more verification cycles have caught the problem? Should we have hired a new specialist who understood an issue we thought wasn’t important, or been lucky about in the past? Are we properly prepared for all the uncertainty related to a new manufacturing process? These are all real fears that every company has, but they must be balanced against cost.

While it is often said that the role of Silicon Valley is to make mistakes fast and improve quickly, this is not an area in which that applies.

EDA has two primary purposes — to reduce costs associated with design and manufacture (which could come from optimization or automation), and to reduce the fear of failure. Each tool that automates, analyzes, or verifies reduces the risk that a failure will escape. In addition, most companies will insert additional logic into devices to reduce the risk of total failure, adding chicken switches, placing extra unused circuits in the chip that can be deployed with a metal change, speeding up the cycle time when problems are found. Other chips rely on software to work around many problems that are found, even if that means a slight reduction in functionality or performance.

How much of the fear is real? There is messaging put out by some EDA companies that is misleading. For example, verification complexity is said to be related to the square of the number of states in the design. This is blatantly wrong because it assumes that every state is somehow dependent on every other state. If there are sub-systems that operate independently, then there is no such expansion in the problem space. If a system is duplicated, there may be dependencies on nearest neighbors, but these can be collapsed into a set of problems that do not have to be verified against each separate duplicated block. What is generally true is that as the design size increases, the number of potential places where bugs may hide increases, and this requires a full understanding about how the system is meant to operate. Knowing the intended use-cases can significantly reduce the total state space that needs to be verified.

A corollary to this is related to coverage. Given the way that coverage is defined today, it is never possible to say what 100% complete means. In theory, it is possible to do this using other ways, but for some reason the industry prefers the existing method, which is based on proxies for coverage. While this was a suitable method when first defined for communications devices, it is awkward at best or incomprehensible for many of today’s systems. It also has no concept of concurrency, sequence, and other important aspects that need to be verified. A new system was proposed for Portable Stimulus that was much better, but it is not clear how much of that made it into the standard.

Risk management is a complex process, and it is where the best managers stand out from the rest. They have to define priorities and acceptance levels, decide when it is worth buying a tool, hiring people, or taking other risk reduction strategies. It is possible that AI may help find the right balance, and in some cases may improve the efficiency of existing tools.

However, that is not in the best interest of EDA companies. Unless the development of those AI tools generates enough income to offset the sales of tools lost because they are more efficient, EDA companies have no economic reason to go down this path. The business models do not align.

This could be an interesting development for the future. It is akin to the ‘work harder or work smarter’ maxim. Buying more tools may increase the number of cycles of verification you can do and thus reduce fear, but using existing tools more effectively may be a better option. EDA has to find ways to sell fear reduction and not just sell tools. The first step along that path is to stop spreading unnecessary fear.



Leave a Reply


(Note: This name will be displayed publicly)