Addressing Pain Points In Chip Design

Partitioning, debug and first-pass working silicon lead the list of problems that need to be solved.

popularity

Semiconductor Engineering sat down to discuss the impact of multi-physics and new market applications on chip design with John Lee, general manager and vice president of ANSYS’ Semiconductor Business Unit; Simon Burke, distinguished engineer at Xilinx, Duane Boning, professor of electrical engineering and computer science at MIT; and Thomas Harms, director EDA/IP Alliance at Infineon. What follows are excerpts of that conversation. Part one of this discussion is here.

SE: It’s always good to be able to have a higher level of abstraction, but you also want to be able to drill all the way down, particularly in safety-critical types of applications such as a 7nm AI chip in a car. How do we balance those two worlds?


(L-R) Duane Boning, Simon Burke, John Lee, Thomas Harms

Burke: Abstraction initially was a means to an end. The design didn’t fit into whatever you were writing it on, so you abstracted something to make it fit. That was a temporary solution. The real problem is that it has to fit. Going to a scale-out model instead of a scale-up model allows you to not abstract the things you would have abstracted before. There are some limits for different models and different levels, but the trend toward having more visibility into a design rather than less is the right answer.

SE: Is that happening with the software as well as the hardware?

Burke: Yes, you see a lot more interactive system-level stuff rather than just booting Linux on an Arm core running an OS all the way down to the silicon. You need to make sure that works in an emulator so you don’t have any issues when the silicon comes back, and that’s becoming a much bigger issue for Xilinx. Doing that kind of system-level simulation to verify you don’t have any fundamental problems is a big deal. But there’s a level of abstraction you can’t go below, particularly for analog.

Harms: There is always a question about performance at this level. You can’t do a full-chip analog simulation, so you have to partition the design and look at a specific domain where it’s necessary. Then you abstract it up. Today, with increasingly complexity, issues also move up into the system. If I have a product and it’s going to be used on a board, I need to provide something so they can do a board simulation with my product. If you’re using a digital twin, you have to abstract something from a lower level down to the entire car in order to have performance at that level and also a level of security. It remains a necessity for system simulation.

Lee: If you look at it from a chip designer’s perspective, they want a chip-centered view of what the system around it looks like, from the package and board perspective. A system designer wants to have a system-centric view of what the chips in the system look like. Where we’re headed is toward platforms, which is like Google Earth. You can look at the world and zoom in on North America and Las Vegas and then a particular street. At the highest level, it’s a photo. It’s an abstraction. What the big data systems give us on a client is the ability to zoom down to that level of detail that you need, and then zoom back out to see the whole picture. At some point you’re going to want to see the detail rather than the big picture and query this and various relationships. Our job as a tools provider is to give you that ability to go from a high level of abstraction down to the lowest level. The challenge is how to handle obfuscation or protection of IP. That’s a whole other discussion.

Boning: I don’t think it’s just about zooming in and zooming out. That’s very important, of course. But what we’re also going to need is more overlapping abstractions. We’re not going to fewer of them. You need an abstraction that you build for a purpose. The abstraction captures the essential elements that you need for that purpose, whether that’s optimization or verification. We’re going to need more ways to look at the same structures from multiple perspectives.

SE: One of the ways we’ve gotten chips out the door is to divide and conquer. As we move into 3D stacking and other advanced packaging, how can we partition these designs?

Burke: There are multiple partitions. You partition a design today as you build it. There are rules you have to follow, hierarchies you have to go to. Those are reconstructive artifacts. One thing we’ve been doing more of is creating virtual subsystems of a chip. So for timing you have a timing subsystem. If you want to do verification, you have one for that. Doing the entire chip is difficult because it’s big. Having a subsystem is a solution, but subsystems only work for that one purpose. What works for verification doesn’t work for timing. Creating a chip from a physical perspective, and then creating virtual subsystems to be able to do the analysis, is essential.

Lee: One of the areas we’re researching right now is power integrity. The power grid is pervasive throughout the whole chip, and then if you look 3D-IC, you have billions, or tens of billions, of devices connected to a single rail. Having said that, we think there’s opportunities to put more structure into the power integrity flow. The timing flow has matured over the years. You can do the timing for a block, insert it into the chip, and do full-chip timing where you don’t have to look at everything at the same level. Power integrity is a lot more than that. You need mathematics to pull it all together. There are implications for how you do power grid design. There are tradeoffs between fully flat and more balanced approaches. It’s a tough problem, but we will be able to partition and do more constructive power grid design and features.

SE: Where do you see the biggest holes in design these days and how do we fix them?

Harms: It’s still debug and verification. You identify issues and you want to pinpoint them. So we have tools, flows, methodologies we want to put in. All the tools generate output, and you want to dig into that and figure out how they correlate from one domain to another. So you look at timing and it doesn’t work, and then you look at power and find something, but how do you now combine that so you understand what caused the hotspot and why you have a timing issue? The whole debugging is problematic. Databases are missing that would allow you to get all information in one place, which would make it easy and pervasive to look across domains and across boundaries of EDA vendors.

Burke: I agree. One of the challenges we have right now is we do our best to get silicon to work first time, but that’s an impossible task. The time it takes just to get silicon back to look at it, and then to debug these huge systems, means the cycle to fix issues is growing quickly. There are big opportunities for machine learning and AI in that space to facilitate the learning from getting back a chip that doesn’t work to understanding why it doesn’t work. Even if it doesn’t identify the exact cause, it should point you in the right direction so that we can optimize it. You can’t fix manufacturing, but you can fix the time it takes to debug a chip.

Lee: This idea that you load in all the data that you have doesn’t work. You have a lot of timing data, physical data, emulation data. You need an open platform to be able to deal with problems across a chip, because a lot of these problems are not in a single domain. The ability of Hadoop and other big data systems to scale out is exactly what we think has been missing from design.

Boning: The debug requires the integration of lots of information, perhaps even back to manufacturing information. But there’s also a thread that we need talented people and access to that integrated information across companies, across EDA suppliers, across their users, in safe and reasonable ways, and connected up to the university community. That’s a way to unleash creativity to attack some of these problems. What would happen if we had a few AI problems that were drawn from our community that would harness the tens of thousands of machine learning people across universities and companies to attack these problems? There’s a lot of hard work ahead. I’m particularly impressed with the efforts of Andrew Kahng (UC San Diego professor of computer science and engineering) to get the community to address the need for openness and to unleash that creativity.

Related Stories
Less Margin, More Respins, And New Markets
How physics is reshaping the leading edge of design.
Raising The Abstraction Level For Power
Finding the right abstraction for power analysis and optimization comes from tool integration.
3D Power Delivery
The design of the power delivery network just got a lot more complicated, and designers can no longer rely on margining when things become vertical.
Using Less Power At The Same Node
When going to a smaller node is no longer an option, how do you get better power performance? Several techniques are possible.



Leave a Reply


(Note: This name will be displayed publicly)