SoC Integration Mistakes

Experts at the Table, part 1: What’s missing; common problems and how to avoid them; what’s behind these errors and re-spins; proximity effects; communication breakdowns.

popularity

Semiconductor Engineering sat down to discuss integration challenges with Ruggero Castagnetti, distinguished engineer at LSI; Rob Aitken, an ARM fellow; Robert Lefferts, director of engineering in Synopsys’ Solutions Group; Bernard Murphy, chief technology officer at Atrenta; and Luigi Capodieci, R&D fellow at GlobalFoundries. What follows are excerpts of that roundtable discussion.

SE: It’s becoming harder to put all the pieces together in an SoC as the amount of IP increases along with complexity involving power domains, multiple processors and I/O. What problems are we facing going forward?

Aitken: People have challenges integrating IP for a couple reasons. One of them is expectations. They think it’s going to do something and it does something else. But the main challenge is complexity. The IP blocks are so complicated, the designs are so complicated, the time scales are so short. There’s a lot of stuff to manage and not a lot of time to do it.

Lefferts: It is challenging because the time frames are getting so short. That’s one of the reasons there is so much IP. There is no one on the planet who could build an SoC in the time allotted if they had to build it all themselves. We’ve seen time scales dropping. We’re taping out against version 0.1 of a PDK knowing that it’s going to have to be re-spun and kept up to date. From an IP provider standpoint, we try to do everything we can to get the highest quality in the short time frame we have. We have something like 15,000 CPUs running 24×7 for the verification. Because integrators are working on a short time scale, they make assumptions about what something is going to do. We’ve seen a lot more cooperation with customers as part of the verification process. It’s a tough challenge. When I was acquired by Synopsys in 2004, you’d take a SerDes and turn it on and leave it and let it run in some data center forever. Now it’s turning on 100 times a second.

Murphy: The problem with quality is that quality has an objective and a subjective component. The objective component is what IP vendors work to get right. The subjective component nobody can get right because it depends on the user. You know generally it’s a CPU or a SerDes, but you also have certain expectations that are not necessarily correct. That’s what trips people up. From a user point of view, that’s still a quality point of view even if it’s well defined in the manuals because it doesn’t match their intuitive expectations. How does anybody fix that? It’s difficult for anyone to absorb all the data and it’s difficult to pick up the stuff that hasn’t been defined in the model.

Castagnetti: There is obviously the time scale at advanced nodes. IP is being co-developed sometimes as we develop our SoC, so there are more unknowns. The other piece is that IP teams view verification at their IP level. We want to make sure it works, but how do we make sure it also is verified in the context of use cases? There are a gazillion use cases. If you start integrating company A’s IP with company B’s IP, how do we make sure that stuff works together? And how do we make sure everyone has interpreted the specs in the same way?

Capodieci: One fundamental aspect of quality for us is manufacturability, or the systematic relationship with yield. This has become more of a quality issue. Certainly IP needs to do what it’s supposed to do and play well with others, but it also has to have been built well with others. It’s important for us to work together with IP developers from the very beginning to guarantee manufacturability and all the yield metrics. That’s one of the fundamentals for any node past 32nm all the way to 14nm and 10nm. From a practical standpoint, IP has to become physical IP as soon as possible and put into different levels of density. We can tolerate certain levels of manufacturability, but if you put it everywhere you’re going to get a big yield drop in your SoC. So we consider the IP itself, along with its use and its density across the entire chip.

SE: Isn’t that one of the big challenges in developing IP? You don’t necessarily know what it’s going to be next to and interacting with.

Lefferts: I have a team that will sit down with the customer and review their floor plan. They’ll look at each of the IPs, what’s next to it, how far away is it, where are the supplies connected, what’s the structure for ESD. That’s a great place—during the floor-plan review—to double check assumptions. What are your clocks coming from? Show me the route. How are you going to route that clock in. What’s connecting the pad ring? Where are the power supplies? A picture is no longer worth 1,000 words. It’s now more like 34 megabytes. Aligning that and seeing the floor plan has been valuable. You can at least get around some of the assumptions being made. I haven’t seen many cases where proximity caused the problem. There have been problems with supply noise. I remember one case where the whole chip was ringed with IP and there was no way to get the power to the core.

Aitken: The power supply issue is a good one, because that’s often the place where IP and its neighbors start fighting with each other. Now that we have all these power modes that are controlled by some combination of hardware, software, middleware, operating systems and so on, it’s hard to know what they’re doing in advance and where the noise is going to be. About half the problems we’ve seen lately are power-supply-related. Someone has connected the supplies wrong or not put in enough current or put in some massive noise generator on some massive analog power supply. All of these things are possible. And some of these things are in the specification where it says, ‘Don’t do this.’ Others are not. That goes into the subjective versus objective verification. So we do a fair amount of margining, as well. If you put this thing in a bad environment and it should work down to minus 10% Vdd, we’ll handle a 20% Vdd drop, as well.

Castagnetti: You get the right views and floor plan analysis. Even though we may buy IP, we also develop IP. How much we are willing to share versus how much Cadence is willing to share becomes an issue when we want to review what is next to each other from a physical implementation standpoint. That’s not an easy problem to solve. How much do you show the end user so they can determine whether to put in an extra ESD structure? And with power integrity, not everyone has the same views. There are a few tools out there, but not all the IP providers support all the tools. So as an integrator, what do you develop to bring in that third-party IP.

SE: While we have too much data, it’s not necessarily the right data?

Castagnetti: That’s right.

Murphy: How does and IP vendor or an integrator ensure the use cases they’re expecting align with the use cases that have actually been tested. That’s a very hard problem. It’s extremely difficult to know whether the way you’re wiggling the knobs are within or outside what the IP developer anticipated.

SE: Is that worse with analog or digital?

Aitken: It’s different. With digital, the biggest problem involves software use cases that are not what the designer expected or a power virus algorithm that excites things in a different way than the verification thought it was going to be. For analog, there are an enormous number of electrical problems that can show up. They’re both bad problems, but they are different. We spend a lot of time on use cases. We’re a lot better at use cases than we were five or six years ago.

SE: Where are other problems cropping up?

Murphy: Configurable IP. If you have more than a few knobs on the IP, then the chances that you can dial them in a way that the supplier never thought possible are much higher. You get into an explosion of things you have to test. Then you’re in a gray zone, and what’s going to happen in that gray zone? It may be okay, it may not be.

Lefferts: If you’re selling a small block—more of a chiplet—then you have a lot more variables than if you have a PCI Express PHY. It’s pretty much spec’d out for the use mode and models are, and how often it’s going to be turned on and off. We don’t necessarily ask the question how much verification is enough for IP, but we do ask whether we’ve done everything we can. That’s the bar we set for ourselves. We’ll take test chips, run the corners, fully characterize them, build margin into the design. That’s going to be very important in advanced nodes. But we’ll take our IP through compliance testing, run it with the controller and verify it. If the customer is going to do something we aren’t already doing on our IP, they run a risk. That’s the bar we set.

Capodieci: The biggest problem is integration and usage. I appreciate the notion of floor planning ahead of time, but one of the biggest problems to this day is fill—and that applies to both analog and digital. You think everything is fine, and you find something is filled with a different type of algorithm or with a generic algorithm rather than what the final fill does. That creates problems. There are other sophisticated manufacturing problems, too, such as forbidden patterns and two-dimensional hotspots. They do not exist together in isolation, but when you add wires together they create these hotspots. That’s another big problem. The way to deal with this is to test, test and keep testing. Right now we’re able to deploy new ways of testing with the IP designer. There’s always the question of who’s going to pay for this, too.

To view part two of this roundtable discussion, click here.