What’s Missing For Designing Chips At The System Level

Experts at the Table: Customization and multi-chip packages make it far more difficult to create chips using traditional approaches.

popularity

Semiconductor Engineering sat down to talk about design challenges in advanced packages and nodes with John Lee, vice president and general manager for semiconductors at Ansys; Shankar Krishnamoorthy, general manager of Synopsys’ Design Group; Simon Burke, distinguished engineer at Xilinx; and Andrew Kahng, professor of CSE and ECE at UC San Diego. This discussion was held at the Ansys IDEAS conference. To view part one, click here.


Fig. 1: (L-R) John Lee, vice president and general manager for semiconductors at Ansys; Shankar Krishnamoorthy, general manager of Synopsys’ Design Group; Simon Burke, distinguished engineer at Xilinx; and Andrew Kahng, professor of CSE and ECE at UC San Diego.

SE: Unlike in the past, chip designs done in advanced nodes are increasingly utilizing unique architectures. So now you have entirely different problems for each new design. Can we still use a divide-and-conquer approach, or does everything now have to be dealt with on the system level?

Krishnamoorthy: We definitely see domain-specific architectures all over the industry. Everyone is looking at that as a way to get a big improvement in performance per watt, but each architecture brings its own set of challenges. Typically, what we are seeing is that a lot of the architectures are targeted toward AI training and inference, and it makes sense to look specifically at that vertical in terms of how those chips are getting built, what those challenges are, and then build very targeted solutions for that set of applications. The AI vertical relies on a heavy replication of blocks. There are a lot of challenges with power integrity, with ultra-low voltage operations because power is a big concern, and there are challenges dealing with access to memory and between compute tiles. We see ultra-low voltage operation as a big opportunity to drive power integrity very early in the flow, and for co-design at every step of the flow with power integrity. We also see a tremendous opportunity for bringing in technologies related to floor planning and dealing with multiple replicated blocks in innovative ways, and technologies for doing structured routing to connect compute tiles with each other, computer-to-memory latency reduction, and to bring in 3D-IC technologies, where we see a lot of latency optimization happening. You need an integrated cockpit to integrate the compute and memory and all the connectivity in between, and to bring multi-physics analysis into the same cockpit. There are probably a few verticals where it makes a lot of sense to deeply understand what those design challenges are, and then to build custom solutions. And we are seeing very good ROI for building those custom solutions in those specific verticals.

Burke: If you look at markets that FPGAs work in — especially data centers, wireless, automotive, and cell phones — five years ago you had generic products that you could deploy everywhere. In the data center we’ve seen the requirement for performance go shooting up, and specialization of hardware to address that. Previously, it used to be just about performance. Today, it’s about performance per watt. You can’t burn kilowatts of power to get the answer. So you end up specializing to optimize performance and power. You see that with data center CPUs today, which are more managers than workers. You offload the compute to something else, whether it’s an FPGA or ASIC. Even in automotive, you’re seeing very specialized silicon. It’s supposed to have a long lifetime and not burn lots of power. You certainly can’t burn kilowatts of power in a car. In the cell-phone business, there used to be multiple different vendors solutions, with CPUs as well as FPGAs. We are seeing consolidation into more specialized products that contain all those functions, just for packaging, cost, and power reasons. Each of those markets is driving that for different reasons, but there is a shift away from generality and into more specialized, unique hardware. That’s a challenge for FPGAs, because our business is based on a general product you reprogram to do anything, and now the specialization is starting to impact that. So we end up having a lot more IPs on the chip to address those high-performance segments. The unification of our markets is growing over time.

Kahng: With 2.5D specialization, the industry really needs to keep an eye on scalability of NRE — the verification and test burden when you have multiple die for a product instead of a single die. The NRE needed to scale acceptably in order to support this Cambrian explosion of innovation in silicon is something the industry will be challenged by in the near future.

Lee: There is a need for hierarchical approaches across design, as well as analysis. And as we look at analysis, the need for accurate bottom-up models, as well as accurate top-down models, is absolutely necessary. For example, if you’re looking at a particular replicated block, it’s important for you to look inside that block in situ, understanding its system-level environment and its neighbors and their behavior. But those models need to be extremely sophisticated, because what happens to this particular instance, whether that’s a functional block or a chiplet in a multi-die system, is very much dependent on the logical behavior of what’s actually happening adjacent to it. That implies a complicated set of hierarchical models that need to be multi-physics-aware, as well as behavioral-aware, and that’s an area where we see a lot of promise and a lot of active interest.

SE: We’re seeing a number of changes across the industry, shifting both left and right, and ultimately breaking down traditional silos. Can existing tools adapt to this, or do we need different tools at different times?

Kahng: The tools are adapting in a much more agile way than they used to. We see process simulation embedded in sign-off and a lot of auto-tuning, which helps to hit schedules. We are seeing things better shape earlier DTCO, pathfinding, and machine learning. And prediction helps reduce guard-banding. EDA vendors are working together to tie traditional types of technology closer together and reduce the latency of any given iteration.

Krishnamoorthy: The term we use is systemic complexity, where essentially all the traditional boundaries need to be revisited. And we need to explore fusion across these boundaries to get the best results. In the last five years, for the biggest gains in terms of performance per watt or any of the other key metrics, faster results have come from fusing across traditional boundaries to achieve a significantly better outcome. Combining sensors and analytics to really create a continuum is a good example of where we have fused monitors and sensors with the whole big data approach to silicon analytics and design sign-off robustness analysis. Similarly, we are basically using multi-physics analysis, with all the design and sign-off technologies, to really enable a sort of concurrent analysis of real issues and real impact on timing, signal integrity, and power. This is the next evolution of our industry, where all these boundaries get fused, but I don’t really see the end user fundamentally changing job descriptions. Sign-off engineers are still doing sign-off, and implementation engineers are still implementing, but their scope is growing. Traditionally, a timing sign-off engineer probably handed off something to the [power] rail engineer. But now the timing and rail engineers are working closely with each other to sign off on the chip. Or similarly, the traditional front end and back end were doing hand-offs, but at the latest nodes you cannot have that kind of model anymore if you want to get the best PPA. So there’s a lot more coupling and learning each other’s domain in order to get better results. It’s fusing across technology areas, but also fusing across the customer job functions to get better outcomes.

Burke: One of the interesting trends we are seeing today, partly because Moore’s Law is slowing down a bit, is that we are moving toward solutions that are multi-silicon die in order to get functionality and capacity and scale. And those those have to talk to each other. So now we’re using interposers and other novel technologies to enable them to talk to each other quicker and with lower latency. We used to do it within a silicon chip. Now, it’s across the system level with multiple technologies involved in that solution, which makes the whole problem much more difficult. One of the side effects is that now we’re pushing into an environment where not all those silicon dies are in the same process node or come from the same silicon manufacturer. You’re mixing manufacturers and nodes together, which for complicates that closure process. Corners don’t exactly align across the system. They have different definitions, different voltages, different specs. And this is not just STA (static timing analysis). This is STA, thermal, EMIR (electromigration and IR). Even LVS (layout vs. schematic) and DRC (design-rule checking) are impacted to some extent by this push into a more complex system-level problem. This impacts everyone in the backend.

Lee: If you put these multiple die on an interposer, the speed of signaling — the communication between the die — can become much faster than if you had to go through package and board. One of the challenges we see is that there’s an increasing effect of electromagnetic interference or cross-talk that can occur on the interposers, or even on-die with high-speed SerDes. So a lot of the techniques that board designers have been doing for signal integrity now are being brought into the 3D-IC world.

Burke: From a system perspective, you can see that putting two dies together on interposer can speed up the communication between them. If you come from the silicon side of the world, suddenly pushing half your silicon onto a separate die slows the whole thing down, because you’ve got to go across something else you get to the die. It depends where you’re coming from as to whether it’s an improvement or a degradation. Either way, you see this convergence of silicon design and packaging teams coming together. But when have four dies connected to an interposer, getting from from Die 1 to Die 4 takes a while. It’s a long way away. You don’t beat physics. Yes, you’ve effectively got a much bigger die, but it still takes the same time to get there. There are a lot of new technologies that enable you to reduce that physical distance from one die to the next, and to get lower latency, along with improved bandwidth and communication. There’s a lot of opportunities in those more mechanical physics — packaging solutions that will get us to high capacity and limit the slowdown in Moore’s Law that we have been seeing. But they’re bringing their own complications. It really falls on the back-end sign-off. How do you make sure the system is actually going to work and continue working once you get silicon back?

Kahng: As you go from the 2D world, where the handoffs between system and technology are pretty clear and well-managed in bump planning, partition planning, NoCs, and things like that, to much more dynamic handoffs in a multi-die chip context, co-design becomes much more demanding. Levels of abstraction across system and technology boundaries, to enable co-analysis efficiently and scalably, are still TBD.

Krishnamoorthy: The whole 3D workflow for designers is extremely fragmented. You explore in one environment, you construct in a different environment, you analyze in a third environment, and potentially sign off in a fourth environment. If you look at the kind of gains we made in SoC design with die-level design, we got those gains by fusing all those things together in a single environment so that we could significantly accelerate co-optimization across all these phases. 3D-IC design is ripe for disruption exactly for these reasons. There’s a lot there, including things like architecture exploration. When you have a monolithic RTL, how do you decide which part of the RTL goes on what die? There’s a cost element to it, and a top-level closure element to it. There are a lot of really interesting problems ahead of us. But we need the right design environment to enable that to happen in a multi-die design.



Leave a Reply


(Note: This name will be displayed publicly)