Modeling Errors

The higher the abstraction, the greater the chance you will find the problem—or create a complete disaster.


Raising the abstraction level in increasingly large and complex design requires proxies. In IC world, we think of them in terms of higher abstractions, but the basic premise is that you can’t focus on ever detail without losing sight of the bigger picture, so we build models that can represent those details.

Done well, these models are incredibly useful. They save time, make it easier to spot errors, and speed time to market for very complex SoCs. Done badly, they can bring designs to their knees, provide wrong information across engineering teams, and affect everything from a chip’s functionality to its yield and manufacturability. The problem is that it’s hard to know when models are slightly off, and at higher levels of abstraction those slight errors are even more significant.

So where do these errors creep in? There are a few places that stand out. One is human error. The more code that is entered by hand, the more likely some of it will be wrong. At the assembly code level, a 5% error will have very localized repercussions. At RTL, it will affect more. And at the ESL level, errors theoretically are easier to catch, but if they make it into the final stages of design the effect could be widespread.

A second place is mixing of models. While models make a lot of sense to use, there is no single model for complex designs. There are ESL models, use models, power models, software models, and thermal models, to name a handful. The challenge is keeping all of these models in sync. It’s like a database that needs constant updating. Mechanisms have to be in place among the various design teams to diligently update all the models that are affected when changes are made. And with teams sometimes scattered around the globe, and sometimes even across different companies, this isn’t always as straightforward as it sounds.

A third area involves IP. While it always makes sense to trust IP vendors—they’re supposed to test and characterize their IP effectively, and they know it far better than you do—that IP isn’t always used in the way it was intended. Companies stretch IP across process nodes, add it into designs in places where proximity effects can create issues, and build it into designs where it may be in contention for resources. Modeling this becomes a challenge, to say the least, and it can affect all the other models in the design flow.

The industry needs models to go forward, and it has the TLM 2.0 standard in place to simplify communication across models. But for models to really be effective, it requires education and discipline among the design teams—and a deep understanding of what can go wrong if teams get any of this wrong. High-level errors at 60,000 feet compounded with other errors can cause a disaster on the ground, and in complex designs not all of it can be fixed with software, extra margin, or even with a single respin.

—Ed Sperling