Do We Have An IC Model Crisis?

Models enable automation and optimization, but today the need for new and more complex models is outstripping the ability to ratify and standardize those models.

popularity

Models are critical for IC design. Without them, it’s impossible to perform analysis, which in turn limits optimizations.

Those optimizations are especially important as semiconductors become more heterogenous, more customized, and as they are integrated into larger systems, creating a need for higher-accuracy models that require massive compute power to develop. But those factors, and others, are making it difficult for industry standard models — which have been fundamental to chip design — to keep pace.

Models are an encapsulation and abstraction of information that makes it possible to drive tools. The accuracy and fidelity of those models must be good enough for the results to be trustworthy, but simple enough such that analysis can be performed in a reasonable length of time. Deciding what should and should not be incorporated into a model is at the heart of what the foundries, academia, and EDA companies do.

In the past, this was relatively straightforward. Moore’s Law and Dennard scaling guaranteed improvements in power and performance at each new process node, and standardized models were developed alongside of scaling. But the benefits of shrinking features are diminishing. While scaling continues, increasingly it is accompanied by a variety of other options, such as advanced packaging and domain-specific architectures.

In many of these designs, margins that once were applied to enable the use of simpler models are no longer acceptable. Companies no longer can afford to leave that level of safety margin on the table.

In addition, various physical effects have become inseparable at the smaller geometries, meaning that more factors must be considered at the same time. For example, timing is related to thermal, which is related to power, which is impacted by layout and activity. And all of these have to be considered in the context of manufacturing variability and device degradation over time.

An increase in analysis complexity has been enabled by a large increase in compute capability. That is making it possible to consider models that are more accurate and simultaneously consider more physical interactions. The big problem is demand for these new models is exceeding the rate at which they can be developed and proven, and it vastly exceeds the rate at which those models can be turned into standards that are consistent across the industry.

Is it fair to call this a model crisis? “There’s always a model crisis,” says Simon Davidmann, CEO of Imperas Software. “This is because it is hard to build models. You’re building them because you want to explore something normally before you’ve built it, or to look at things you can’t actually look at with the real thing.”

Building models takes lots of data. “We generally have less data access than we used to have,” says Thomas Andersen, vice president of the AI and Machine Learning Solutions Group at Synopsys. “With each new technology node, which happens every 18 months, the data is changing. Margins are everywhere, be it in timing or manufacturing. These allow you to account for some inaccuracies in the process, because every piece of the flow can only model things to a certain accuracy level. In theory, everything could be modeled through machine learning and be entirely data-driven. That would probably reduce margins and improve the overall process, but this would require that everybody works together, starting with the foundries.”

That desire to reduce margins is adding to the pressure. “Customers are always trying to push the power performance area (PPA) and cost envelope,” says Brandon Bautz, senior group director for product management at Cadence Design Systems. “To do that, you need a better method of analysis, be it architectural, or RTL simulation for power measurement, down to final silicon sign-off. Fundamentally, it’s people trying to analyze areas they previously weren’t paying attention to. They are now trying to replace that with more accurate analysis to see if they can squeeze their design and get better power/performance/area or lower cost. Models are at the forefront of that. Models are critical, and faster compute and more compute resources are enabling designers to do analysis that previously wasn’t available. Models are a means of abstracting the analysis.”

New models have been enabled by new types of compute, and far more compute horsepower. “Multi-threading, distributed analysis, analysis on the cloud — all of these technologies enable more compute power to be thrown at the problem,” adds Bautz. “When you look at multi-physics analysis, you can now connect thermal effects at the packaging level down to die-level performance. These are things we couldn’t do 10 years ago because we didn’t have the CPUs to do them. Fundamentally, the algorithm itself has to be well designed and reasonable in terms of runtime versus accuracy, but at the same time, just the advent of distributable systems and more CPUs enables users to throw more and more compute at the problem. Compute power opens up the avenue for these types of analysis.”

More connected
At the center of the model crisis are increased dependencies. “Twenty years ago, timing was just an afterthought,” says Bautz. “Fifteen to ten years ago, timing became central to implementation. That required integrating the timer to the place and route system and having true timing-driven placement, timing-driven optimization. Along with that, signal integrity analysis had to be performed. Five years ago, variation formats started playing into the overall equation. Now we’re looking at things like the thermal impact on delay, the IR drop impact on delay, and most recently, the aging impact of the transistor — how the transistor performs over time and how that ultimately impacts timing.”

The industry typically has used tiered models, starting with SPICE and process models. From these, timing models and higher-level models are created. This goes all the way up to software. But the more abstract the models, the less accurate they are. You cannot perform useful analysis with unsuitable models. However, until recently, it wasn’t possible to retain enough detail for some types of analysis.

Each level is an abstraction of the previous levels. “Much of it can be baked into the timing model, but you have to have the transistor model to formulate certain aspects of the timing model,” says Bautz. “For example, how does my transistor vary as a function of temperature? How does my transistor vary as a function of voltage? How does my transistor vary as a function of time? This all goes back to the process model, the transistor model, the SPICE models that form the basis of the timing model. It is one step removed from timing, but the analysis itself and the ultimate impact on your design comes at the timing level.”

Whenever models have to be created that are not directly tied to lower models, problems can arise. “Today we have very well-defined interfaces that comes from the foundries in terms of the Liberty format,” says Jay Roy, group director for SoC Power Continuum at Synopsys. “For any new process, they will create Liberty data for each of the cells, and they will be very well characterized for timing, for area, and for power. While models are available at the cell level, and at the basic gate level, there are no models which are appropriate at the CPU level. If I want to compare between two processors, the power models have to be created. They need to be sensitized to the clock frequency, to the underlying technology on which it will be implemented, to the data rate that is flowing through it. There are multiple axes for which they need to be characterized, and the models need to be sensitized to those. There is no industry standard for such models. Internally, people may have their own ways of doing it, but nothing that a tool can use to automate or do proper analysis.”

What makes this even more complex is that interactions between different parts of a system need to be captured in models. “Before designing the product itself, we have to build up models out of the parts, and if those models fit together and if we have a chance to see whether those parts in terms of their models work together, then we can be sure that the system itself will work later on,” said Roland Jancke, department head for design methodology at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Our concentration for years has been on modeling of individual parts of systems and to have generic interfaces for those models. There are standards outside for developing those models, such as the functional mock-up interface, so that we have mock-ups for the individual parts and not the individual parts themselves.”

Jancke said this approach has been used in automotive chip design. “Generic interfaces might not reflect the best way to model a system, but once it’s possible to glue those different models together, you gain a lot.”

Stretching to software
These problems extend all the way up to software. “When you are doing timing analysis of software, it’s hard to get it right because you’ve got the microarchitecture to worry about,” says Imperas’ Davidmann. “You also have to consider caching, the memory hierarchy, and contentions. If you want to consider power, you need all of that and a lot more. Power adds another level of complexity. You can approximate timing with relatively simple simulations. When you compile it down to gates, the big EDA vendors will provide detailed timing. Power is similar, but requires that you go even deeper — and that complicates the analysis. The net result is that predicting the power implications of software is extremely difficult.”

For software, the models have yet to extend much beyond functionality and performance. “When people write software, there is a compiler in between,” says Synopsys’ Roy. “The compiler is reading your source code, and it is creating assembly-level instructions that execute on the hardware platform. The compilers, so far, have been tuned to optimize for speed of running the software. They can look at two possible sequences of assembly instructions and look at the total runtime for each. What the compiler does not understand is the power or energy profile for those sequences of instructions. That is information that the compiler does not have. The hardware models that are available to the compiler have cycle information, or maybe even the timing information, but no power information. Modeling of power needs to be done before the compiler can become smart to optimize for it. Once that happens, compilers can give some directives to the software engineers to write better code, but we are not there yet. For us to get there, we have to we have to cross the bridge of creating the power models.”

We are inching closer. “When people are designing a processor, they need a model as a reference of the specification of the ISA,” says Davidmann. “Typically, that’s proprietary. But when it comes to RISC-V, that didn’t really exist. We had a modeling technology and got drawn into the golden reference area, where people wanted a high-quality, configurable model of the specification that they could use as their reference when they tested their RTL. It allows them to use the same model for the hardware verification as well as their software development environment. Our model can do timing analysis and power analysis if you annotate them. If you put the right data in there you can get estimations for this, and that will help the architects a bit. Those models can be used in architectural frameworks to test the performance of the whole system. The needs of models are changing. The industry is moving to a space where lots more people are designing their own processors. That means they need new models to help them do that efficiently.”

Industry convergence
Power and thermal are relatively new physical effects that the industry is attempting to model. “A lot of the industry is still naive about thermal issues,” says Marc Swinnen, director of product marketing at Ansys. “The chip is often simply modeled as a certain temperature for the entire chip, and that’s not the case. You need to have a more detailed model. For example, we have a chip thermal model where the die is basically divided into 10 micron-by-10 micron squares, and we have a table that relates how power is a function of the temperature. Power depends on the temperature, but the temperature depends on the power. We use a table to enable you to measure, at the system level, what temperature your chip is going to be, and it will tell you the power consumed by that square, which of course feeds back into what the temperature gradient will be. You can converge on a consistent solution, where the power output matches the temperature it is at.”

The industry has yet to converge on a model for this, and thus these types of solutions remain proprietary.

Standards have helped to align the industry. “Industry standardization is important, especially when it comes to input collateral within the tool,” says Bautz. “The Liberty Technical Advisory Board (LTAB) helps to govern and manage those models. But they’re not always keeping pace, and that process can be slow in terms of ratifying a major change to the timing model standard. That means individual EDA vendors have to create their own format to illustrate a particular capability. Over time it may become standardized across the industry.”

Standards struggle to keep up. “That necessitates innovation within the EDA vendors to formulate proprietary standards,” continues Bautz. “Over time, those proprietary standards may make their way back into the industry standard and that benefits everyone. A great example with timing libraries is something called Liberty Variation Format (LVF). Synopsys had their format, and at Cadence, we had our own format for a number of years. The key word here is ‘years.’ The industry changes faster than years, and the standards tend to change over years. They remained separated for years, and eventually they merged and became the industry standard LVF format.”

Conclusion
While the industry benefits from having standard models, they take time and effort to develop. Today, the demands being placed on models are changing more rapidly than the process can accommodate, and this necessitates the creation of proprietary models. To some extent, this always has been the case, because EDA companies have to create and verify models before the industry would be ready to adopt them.

What is different is that the standards are falling further behind in some areas, and in some cases there has not even been pressure to come up with a standard. As the industry looks to new areas to remove wasted performance or power, the importance will grow, and new proprietary models will be created. If those models cannot be directly tied to the hierarchy of models that already exist, the cost of creating those models may be prohibitive, which will slow their uptake.



Leave a Reply


(Note: This name will be displayed publicly)