Low Power-High Performance

Designs Beyond The Reticle Limit

Chips are hitting technical and economic obstacles, but that is barely slowing the rate of advancement in design size and complexity.

November 12th, 2020 - By: Brian Bailey

Designs continue to grow in size and complexity, but today they are reaching both physical and economic challenges. These challenges are causing a reversal of the integration trend that has provided much of the performance and power gains over the past couple of decades.

The industry, far from giving up, is exploring new ways to enable designs to go beyond the reticle size, which is around 800mm square. Some solutions are only available to large tier 1 semiconductor companies as the technology gets refined. But these solutions are likely to go mainstream soon, being driven by several use case scenarios.

At the same time, Moore’s Law is slowing or becoming less economical, as well. “The thought about Moore’s law slowing down or ending seriously started when we jumped from the first version of the finFETs to the second version of finFETs, when 7nm came about,” says Vinay Patwardhan, senior director, product management at Cadence. “The cost discussion and the reticle size limit discussion started happening. While there are people who focus on what is beyond 3nm, it has triggered a lot of discussions, and this has now translated into investment into multiple chiplets, or a de-aggregated SoC flow.”

These solutions generally are classified as 2.5D integrations and are a composition of macro functions. “We typically expect designs to be partitioned across 2.5D chiplet boundaries at a coarse level,” says Peter Greenhalgh, vice president of technology and fellow at Arm. “For example, this could be a chiplet containing a CPU connected via a CXL or CCIX interface to an accelerator or GPU, or multiple CPU chiplets connected together to maximize scale-out where there are thermal limits. Chiplets impose some additional design considerations, such as handling coherency across dies and being tolerant to increased latency when sharing data or performing maintenance operations.”

Chiplets are one potential use case for this type of design. Manmeet Walia, senior product manager for high-speed SerDes at Synopsys, classifies use cases into four types, as shown in figure 1 (below). He notes that many designs may fall into multiple categories.

Fig 1. Paths towards chip de-aggregation. Source: Synopsys.

Fig 1. Paths toward chip de-aggregation. Source: Synopsys.

Scale SoC. An example is the AMD’s Ryzen chipsets. Once you have built a CPU die, you can have one die for a laptop, two for a desktop, eight for a high-end server. You can scale the SoC depending on the end application. This is also important for AI chips, where they need to be able to scale, and the way to scale performance is by having multiple dies. Depending on the type of AI performance you need, you would build your system accordingly. This is not limited to designs on the latest nodes. “There are some customers that may not be on 7nm today, they are on older nodes, but they want to add more functionality,” adds Cadence’s Patwardhan. “They are hitting the limit because even on the older nodes, they may need more memory, but adding that much on an older technology is not feasible.”
Split SoC. This is more along the lines of splitting the SoC, because of the classic physics problem. Some chips, like huge switches or FPGAs, are getting so big that they are just not feasible in terms of yield — even if they are not approaching the maximum reticle size.
Aggregate functions. This is DARPA’s objective — aggregate functions into a chip. Consider a 5G base station where you have an RF chip and the digital baseband chip, and you put these functions together with a die-2-die (D2D) link. You can do RF better in 16nm or 28nm. You can do digital better in 7nm. This enables you to optimize the functions to the process node, reduce the power, get a better form factor. Instead of doing four different chips, you now have four different dies on an organic substrate. DARPA has been pushing the industry in this direction through its Common Heterogeneous Integration and IP Reuse Strategies (CHIPS) program. It says the monolithic nature of state-of-the-art SoCs is not always acceptable for Department of Defense (DoD) or other low-volume applications due to factors such as high initial prototype costs and requirements for alternative material sets.
Disaggregate central and I/O. The idea here is that as you are going to 5nm or 3nm, you cannot build a 100G SerDes very effectively, so you leave that in seven nanometers in a chiplet. You continue evolving the central chip and connect these I/O dies. A classic example of that is the Intel’s Northbridge/Southbridge concept, which uses an I/O die and a central CPU die.

Memory is another function that has been best implemented in specialty processes. “A big architectural advantage comes from how much memory a piece of logic can access,” says Patwardhan. “This was limited on 2D chips by physical dimensions, the congestion, and other effects on the chip. But when you go 3D, you can have access to a lot more memory and so complete re-architecture is possible. From the designer point of view, that is a big advantage.”

This technology is no longer just theoretical. “In the last two years, and particularly starting this year, we saw a lot of customers trying to build test chips with advanced packaging,” says Patwardhan. “There have been more serious discussions with the foundries on the right advanced package options for them, particularly related to multi die development. This is also true for the foundries, who have started investing more into reference flow development for disaggregated SoCs.”

It should be noted that these are all 2.5D types of integration, even though some of them may utilize stacked dies. “CoWos and InFO, and the equivalents at other foundries, are 2.5D technologies,” says Sooyong Kim, director product specialist in 3D-IC chip package systems and multiphysics at Ansys. “Wafer on wafer is not really 3D-IC either. True 3D-IC present a lot more challenges, but there are people planning to do this, even though this remains a long way off.”

Communications
Wherever there is a divide, some form of communications is required. In the past, Rent’s rule established the empirical relationship between the amount of logic in a block and the number of external signal connections that are possible. This drove many architectural considerations.

“Bumps were in the order of one or two thousand in a substrate,” says Ansys’ Kim. “This now increases hundred-fold when you look at an interposer, another hundred-fold when you look at 3D-ICs. We are talking about millions of connections at that point.”

Communications increasingly has relied on high-performance SerDes. “Currently, SerDes-based solutions can operate at a rate of 112G,” says Synopsys’ Walia. “But within a package we are talking about distances that are 10mm or 20mm — no more than 50mm. We don’t need a heavy duty PHY. We need something very simple, very minimalistic, more like a clock-forwarded architecture that just needs to drive the signal a few tens of millimeters, versus driving a PCB and possibly a copper cable.”

We are seeing this evolve for in-package connections to memory. “The Hardware Platform Interface (HPI) is an open specification which is a parallel interface, similar to the HBM interface for accessing DRAM across an interposer,” adds Walia. “These dies are literally sitting on top of each other and instead of having big bumps, which are 130 or 140 microns wide. We have micro bumps which are really tightly packed. Now, instead of having hundreds of bumps, we have thousands of micro bumps. And in this case, they’re all talking at lower speeds.”

Tools and flows
Adoption of a 2.5D design style has significant impacts on the tools and flows, many of which are still being developed to help with the transition. At the highest level, they may appear to be very similar. “When you talk about standard 2D ASIC designs, there is the floor-planning stage, where you are doing some feasibility studies,” says Patwardhan. “Then you decide on functionality and where the partitions will be. You implement that, do some sign-off, and feed back your findings to either the implementation stage or all the way back to the planning phase.”

For IP blocks, they will see basically no change. “Since 2.5D chiplets do not fundamentally change the nature of most component IP like CPUs, GPUs or NPUs, there’s no change to design or verification methodologies for the IP that is delivered as synthesizable RTL,” says Arm’s Greenhalgh. “For coherent interconnect design and verification, some additional steps are needed to ensure scalability to a chiplet environment, but it’s not significant.”

Perhaps the biggest general question can be encapsulated by asking, “Is an interposer a chip or a PCB?” Today it looks a lot like a PCB.

“The guys who are in charge of the interposer, are not familiar with silicon-based tools,” says Kim. “This creates a learning barrier. So they have to rely on the chip-level guys for verification, but they still have to design it. So design and verification are done by different groups. It is somewhat chaotic right now, and the division of responsibility is not easy.”

The goal is to have an integrated IC development flow. “Design teams want something similar to the existing IC design flow, so it will be easier for them to understand, just with the added complexity of an extra dimension,” says Patwardhan. “If you’re partitioning a design into two different chips, there should be some way to do early floor planning, early trial and error, system level planning for which partition should have what. Should it be logic on logic or memory on logic, or memory with some logic on top of logic? They are looking for a quick way to do that. Most of the implementation is defined by the foundry. They define what the 3D structures look like, through silicon vias (TSVs), microbump locations and things like that. Foundries are doing extensive work on this and considering things like the most efficient power delivery network.”

There is an added layer of complication. “Previously, power integrity, signal integrity, or even mechanical integrity used to be separate topics handled by different groups,” says Kim. “Timing was also done separately. This is no longer possible because margins are very tight, and each chip may be affecting each other because they are now much closer. Previously, they were solvable separately and we could separate memory from logic, but now they are only separated by microbumps. We are talking about millions of connections between the dies. It has become a multi-physics problem, and it becomes more important to do the co-simulations.”

It also creates additional sign-off issues. “In a typical 2D design, you would do some physical sign off, where you do a basic LVS check and DRC check,” says Patwardhan. “Now, these checks have to be expanded so you can verify inter-die connectivity, inter-die DRC checks with respect to some package rules or interposer rules.”

To do all of this, you need physical models for every die at the right level of abstraction. “When you run IR drop analysis, we can include an effective package model in the IR drop analysis,” says Patwardhan. “We could include an effective board model on the IR drop analysis and improve the performance of the chip. We could make the chip more reliable for automotive or defense applications, based on the system-level input, which may include EM or temperature variation. All of these cause the currents to change and the analysis we do on current has to take into account the resistive and capacitive load of the package and the board. All that information has to be fed back and utilized to improve the performance, power or area of the chip. Those considerations are the next level, but today it is more about just doing the integration and making the 3D system work functionality. How we get better PPA on the chips through system-level feedback is the next challenge.”

Those feedback loops will be very important. “At the beginning of the design cycle, it is more important than ever to prototype it,” says Kim. “It’s very important to estimate and find the correct partitions and create the floorplan in 3D. To do that, they will have to consider the various physics at the same time. Then from prototype estimation, as you include more details, you do the simulation of the model and confirm it. Verification now has to be done, starting from the very earliest prototyping stage and through until tape-out.”

Functional interfaces
One of the biggest problems for the industry is that much of the development on interfaces has been proprietary. “A lot of the big companies started with something proprietary because a lot of them started early,” says Walia. “Intel has their Embedded Multi-die Interconnect Bridge (EMIB) flow. AMD has its Infinity fabric. Nvidia has developed NVLink. And Qualcomm has Qlink. Everyone is going down their own path. Standards were lagging, but now they are trying to catch up. The Optical Internetworking Forum (OIF) is driving a number of standards, such as extra short and very short reach die-to-die interfaces. There is also a lot of activity in the OCP open compute platform. We have the Open HBI being specked out, where Xilinx is taking the lead. A lot of stuff happening in the overall ecosystem. And, of course, TSMC and Samsung are working on their packaging flows.”

There still appear to be some holes, such as packaging standards for chiplets. “We are at a place where someone has to take the leadership and figure out what an ideal IP, that’s getting integrated into packages, should look like,” says Patwardhan. “It needs to be something that is convenient for the customer. So the tools are there, the methodology is there, but not a lot of people know about what should be done or what their IP is meant to do. The standardization is being developed by foundries and by IP vendors, so we might see it soon. I don’t think there are any financial or technical reasons for not doing it. It’s just that at the stage where we are, it hasn’t been defined yet.”

Conclusion
Design sizes are not slowing down even though the technological path to get there is changing. Designers want to be able to utilize the right process technologies for their implementations and incorporate multiple dies into a package to provide the necessary scalability.

Today it retains elements of the Wild West as each foundry develops its technologies, IP companies decide the best ways to provide the necessary models for physical chiplet IP, the industry agrees on interconnect and modeling standards, and the EDA companies develop the necessary extension to the IC design flows to incorporate all of the demands of an added dimension as well as increasing size and complexity.

The good news is that everyone is excited about the possibilities that this will create. Disruption can be good.

Brian Bailey

(all posts)
Brian Bailey is Technology Editor/EDA for Semiconductor Engineering.

Designs Beyond The Reticle Limit

Brian Bailey

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Recent Comments

About

Navigation

Connect With Us

Designs Beyond The Reticle Limit

Brian Bailey

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored