Raising IP Integration Up A Level

Integrating IP has never been easy, but it is about to take a big leap in complexity. Identifying the necessary models and abstractions is just beginning.

popularity

An increase in the number and complexity of IP blocks, coupled with changing architectures and design concerns, are driving up the need for new tools that can enable, automate, and optimize integration in advanced chips and packages.

Power, security, verification and a host of other issues are cross-cutting concerns, and they make pure hierarchical approaches difficult. Adding to future complexity is the emergence of chiplets, bringing a new notion of IP and reuse into play, and which potentially will require a whole new class of models, analysis and tools.

The concept of IP and reuse is hardly new. From a rather modest start in the 1990s, it has grown to include more than 90% of the content for most semiconductors. But things have certainly not stayed constant during that period, and today that integration is becoming much more difficult.

“We are rapidly approaching, or may have already passed, the point where the limiting factor is not the size of a device that we can manufacture, but the size of the device that we can comprehend,” says Matt Graham, product engineering group director at Cadence. “We have gotten so good at doing reusable IP, and chunking down and integrating all these reusable IPs, that no expert — or even small team of experts in an entire organization — have the breadth of knowledge to be able to fully comprehend everything in the system. That gap between the ability to comprehend and what’s capable, will limit productivity in terms of building and creating and designing.”

The problems do not remain constant. “The dramatic increase in complexity, or the size of the design, adds complexity that leads to new failure modes,” says Prakash Narain, president and CEO for Real Intent. “For example, clock domain crossing (CDC) is a failure mode that shows up at the logical level, but the genesis is in the physical space. CDC means we cannot meet the timing, and that leads to meta-stability, resulting in unpredictable behavior in the design. We need to protect against these using design methodologies.”

Complexity is being added at all levels. “The end of Moore’s Law has meant that designs no longer contain just one or two big processing engines, a big memory and then some other stuff that connects it,” says Nick Heaton, distinguished engineer and SoC verification architect at Cadence. “Today, we have many cores, quite often heterogeneous. There may be domain-specific engines for ML, for graphics, for whatever. Plus, you’re now connecting them together with a piece of IP that is unbelievably complex in scale. This also provides a massive opportunity for alternative architectures. Maybe the memory becomes central, and the processing is in the chiplets. There are different ways of doing that. What people are doing with chiplets today is fairly limited, but who knows where that will go.”

So there is heterogeneous integration in terms of processor domains of computation, and now the industry is about to add heterogeneous integration at the physical level in terms of multi-dies connected within a package, using either 2.5D or 3D integration techniques.

“Multi-die is going to be multi-node,” says Michael Posner, senior director for die-to-die connectivity at Synopsys. “The difficulty is not so much the analysis. It’s going to be the sign-off, specifically for either an organic substrate, or advanced interposer because of the corners. If you think about a single monolithic die, you can characterize your corners. When you go to two dies, you have to do cross corners for full margin analysis. Three dies, and you start to get into the realm of scary. The mixing of technologies is a well-understood problem that has been solved on an individual die basis, but the corner variance problem is new.”

That also will add new physical issues, which in turn create other issues. “One new thing that we are seeing is related to stress that happens when you start stacking chips,” says Sathish Balasubramanian, head of product management and marketing for AMS at Siemens EDA. “It’s not uniform. With HBM memory, people started noticing some performance and some dead-on-arrival chiplets because of stress. If we extend that stacking into the SoC, which is not a repeated structure, it is even worse because it’s not repetitive. It’s more random. So stress is another physical effect that is going to be a big factor. How do you model stress? If I get a stacked die or a stacked chiplet from you, how do I know that everything is satisfied, that I don’t have to worry about stress?”

All of that needs to be better understood. “Packaging requirements are going to introduce other architectural constraints,” says Real Intent’s Narain. “Boundaries are where some physical requirements are going to be abstracted out, and they need to meet some design rules and guidelines. If you don’t meet those rules and guidelines, they will lead to new failure modes. That needs to be verified, and so the failure modes would need to be modeled and analyzed to make sure that they don’t exist.”

Put simply, there are a lot of unknowns. “There are no general standards for a chiplet,” says Shekhar Kapoor, senior director of marketing at Synopsys. “The industry has to sort out things like the electrical, the physical, the thermal models for the chiplets, into a system analysis flow. The way of dealing with the problem is the same as our hierarchical approaches, but the models may be different between the tools that are being used. You decide, between the customer and the EDA tool provider, what models will adequately define the problem set that they’re dealing with. That is how it is happening, and this is an area that is ripe for standardization. We are not there yet.”

Interfaces
In the early days of IP reuse, organizations such as the Virtual Socket Interface Alliance (VSIA) attempted to create standard protocols that would enable IP from different venders to be brought together. Over time, those virtual connections have increased in scope and complexity. “By having more standardized component, it makes itself more amenable to abstraction,” says Narain. “For simulation, we can get abstracted models for it. But these structured techniques come at some cost, and you have to do a cost/benefit analysis. Given the complexity and design times, there is more and more simplification towards getting a reliable design done, as opposed to going into third-order optimization principles.”

There is certainly a lot of work underway to create standard interfaces. “UCIe is an inflection point,” says Synopsys’ Posner. “It’s a new interface. While it is a die-to-die interface, it might as well be a chip-to-chip interface. Before it was the Wild West — ‘Is it XSR, is OpenHBI, proprietary NVLink, Ultra Fusion?’ UCIe has brought a great convergence around the interface. But that is not enough. Each time we work with a different foundry, we have to customize the flow to their technology. There is some standardization between the technology, but they all have their spins, their differentiation. Fundamentally, you’ve got organic substrate, you’ve got RDL (redistribution layer) sitting on top of that, and the new buzz is silicon bridge, where they’re embedding an interposer into that RDL. It’s a cross between organic substrate, where you are limited to bump pitch, but you can have a silicon interposer bridge embedded into it for very high-performance connectivity.”

While some consider UCIe to be a very heavy-weight interface, that may be necessary — at least initially. “Does the noise on the substrate matter?” asks Cadence’s Heaton. “That impacts bit-error-rates, and protocols like UCIe have mechanisms to support that. There is a ton of complexity just to get to the edge of the chip, across a gap, and back into the next chip. UCIe may feel a bit like a sledgehammer to crack a nut for some applications, but if you go with something simpler, like Bunch of Wires (BoW), you have to worry about that yourself. There’s an incredible amount of latency introduced when you serialize it, the training sequences and whatever, and then deserialize. So customers will have to decide upon this for themselves. For some applications where latency is critical, particularly for communications, or situations where the locality of the memory will be more important than the amount of memory, you may need to architect things in a different way.”

Tool migration
Many of the tools in use today were designed for monolithic dies and potentially optimized for one technology. When multiple dies come into the picture, there are increasing integration challenges.

“If you look at the core engines, they define how you perform analysis, how the fundamental physics is captured,” says Malik Vasirikala, director and product specialist for Ansys. “We are able to re-use engines, but the way you have to productize them for heterogeneous integration is completely different. For thermal, when there is a single die, all you needed to model was the package and the temperature at the surface of the package. Once you have a multi-die system, you have multiple heat-emitting sources, and the package which is dissipating. The way you create the solution, which is usable by designer, becomes a little bit different.”

This adds a new modeling challenge. “You need to model the flow of electrons across these different technologies, and that is a big challenge,” says Siemens’ Balasubramanian. “There are different ways of approaching it, and people have different workarounds. Voltages are going to be different. You might have a bigger well. You need guard rings. With heterogeneous integration, there could be all sorts of new issues. If we start plugging in 180nm, where the supply voltage is much higher, to 5nm next to it, and the power lines are close enough, you might have problems.”

At the functional level, multiple abstractions are required for each block. “We typically try and develop the models using a co-design technique,” says Posner. “If it’s an RTL controller, the abstract models are made in parallel. It’s not completely parallel because you want to complete the RTL development or complete the model development first. And then you have something to test against. It’s similar for the emulation models, for the PHYs — all of the later generations, 7nm and below.”

One hope is that some of the newer models can be created automatically. “When you analyze a block or die, we have the capability to create an abstract model from that,” says Ansys’ Vasirikala. “If I analyze a chip, looking at the chip internals, I also know how it behaves at an interface point. I am able to create a model as if I’m seeing this whole part from the periphery, or at the boundary or the interface point of the chip to the external world. Once I know that behavior, I can create a model out of it. And when I’m analyzing the other chip, I don’t need the details of the chip. I just plug in that behavioral model into this analysis.”

Abstractions are necessary, but also limiting. “The level of detail that you need to have in the abstraction is governed by the specific technology, or the implementation details of the application at hand, and the accuracy that is required,” says Narain. “If you look at clock domain crossing approaches, they utilize a hierarchical methodology today. It starts with a low precision that basically creates an abstracted model in terms of course attributes on the I/O pins of the chiplet, or for that matter, whatever is the block of design that is being abstracted. And with that, you can get a certain level of accuracy in your analysis. But if you are concerned about specific failure modes, like convergence, you cannot do that with that kind of an abstract model. You can overcome that problem by going to a more precise approach, which retains greater amount of information about the design, but now the abstracted model becomes larger.”

It’s important to note that the reason the abstraction is being created is because the unified model, which contains all the information, cannot be abstracted. Abstraction can only be made in the context of an application, which means multiple models and multiple abstractions.

“For chiplets, we have different models,” says Vasirikala. “For a power integrity model, we have what we call a chip-power model (CPM). For thermal, we have the chip thermal model, which also tells you about things like metal densities, how the heat dissipates, and how much heat is being generated in different regions of the chip. Same again for signal integrity. We have the chip signal models. For each and every kind of physics, we create some models of the chip, and those models are used for the appropriate system level analysis.”

Conclusion
As increasing numbers of complex IP blocks are brought together, using increasingly advanced manufacturing methods, new failure modes are emerging. Today, the industry is just beginning to learn about some of them, and they potentially will cause problems until they are understood well enough to be correctly modeled and analyzed. The correct abstractions for those models have to be identified and, if possible, those models need to be synthesizable from more complete models so their fidelity is guaranteed.

Editor’s note: Next month, the role of the virtual prototype as an integration and analysis platform will be explored, including the new types of information that it will have to contain.



Leave a Reply


(Note: This name will be displayed publicly)