Waiting For Chiplet Interfaces

Plug-and-play approaches are gaining mindshare, even if some of the key pieces are missing.

popularity

There aren’t many success stories related to chiplets today for a very simple reason—there are few standard interfaces defined for how to connect them.

In fact, the only way to use them is to control both sides of the interface with a proprietary interface and protocol. The one exception is the definition of HBM2, which enables large quantities of third-party DRAM to be connected to a logic device with high bandwidth and significantly lower power than a chip-to-chip connection.

Chiplets would enable an ASIC to be partitioned into multiple die and then interconnected together within a package to make an integrated system. When that die is a prepackaged function designed and built by one company, such as a USB controller, memory block or compute cluster, and is sold as a physical die to multiple other companies, it is called a chiplet. This is very similar to the way in which printed circuit boards (PCB) are designed and built today, only everything happens within a single package. When all the die are designed and built by the same company and modification can be made to all dies, that is referred to as 3D design.

Without standards, the market will fail to materialize. “You must have the necessary industry standard for the connection between the chiplet and the SoC,” says Hugh Durdan, vice president for strategy and products at eSilicon. “Unless there is a standard, you will never have the necessary interoperability between what is available as chiplets and what people want for the rest of the SoC. HBM is a good example of a chip-to-chip in-package interface and has been very successful.”

What will these interfaces look like? “They sit somewhere between a board and on-die interfaces,” says Bapi Vinnakota, director for silicon architecture program management at Netronome, and Open Domain-Specific Architecture (ODSA) subproject lead for Open Compute Project (OCP). “It has some characteristics of a board interface, such as it must have a mechanism for transferring large amounts of data, but needs to have low latency like an on-die interface. The interface is a mix of what works at the board level and what works at the die level.”

Choosing interfaces
There is some precedent for the chiplet model. Marvell introduced its modular chip (MoChi) architecture in 2015, which was a chiplet model based upon the Kandou Bus interface, and it has been using that approach internally ever for its own products.

“The first problem we encountered was selecting the interface—what is the best IP to run inter-chip communication,” said Yaniv Kopelman, networking CTO at Marvell. “We wanted something running over an organic substrate rather than an interposer or an InFO (TSMC’s integrated fan-out) type of package because we didn’t want a high-cost package and we didn’t want to be tied to a single vendor. The second problem was the architecture. With chiplets, you have to divide IP in the middle. The question was where to cut and how to develop the architecture so that you could switch CPUs when you wanted. For that, you have to look at the latency of the components and take care of the logical implementation. The third challenge was getting this all into production. It’s easy to build IP working on a demo, but it’s a long way from there to something that is production-worthy.”

Today, existing interfaces, often defined for other purposes, are being used while dedicated new interfaces are emerging. Rishi Chugh, senior product management group director in the Cadence IP Group provides some examples. “There are initiatives in the Optical Internetworking Forum (OIF) for chiplets and in the JEDEC committee. Then there are organization like Intel, which has the Advanced Interface Bus (AIB), and Intel is open to providing the specifications.”

The OIF project is intended to enable intra-package interconnects to optical engines, or between dies, with high throughput density and low normalized power with a reach up to 50 mm. The new CEI-112G-XSR (extra short reach) project also aims to support a mix of technology, specifically CMOS-to-SiGe often used to build optical engines. System-in-package (SIP) designs lead to a requirement to support as much as 50-mm trace length between the multiple chips on an organic package substrate.

Intel’s AIB is a die-to-die PHY level standard that enables a modular approach to system design with a library of chiplet intellectual property (IP) blocks.

“AIB uses a clock forwarded parallel data transfer mechanism similar to DDR DRAM interfaces,” explains Chugh. “It is process and packaging technology agnostic and could utilize Intel’s Embedded Multi-Die Interconnect Bridge (EMIB) or TSMC’s CoWoS (chip on wafer on substrate) for example.”

Intel now provides the AIB interface license royalty-free to enable a broad ecosystem of chiplets, design methodologies or service providers, foundries, packaging, and system vendors.

Some of these standards may enable the market to emerge. “Intel has a lot of leverage and because of that, and AIB is a clear initial winner,” says Mick Posner, director of product marketing for DesignWare IP Subsystems at Synopsys. “But the fight is not over. If you dive into AIB, or other proposed interfaces, they each have weaknesses in different areas, be it performance or capabilities. AIB, as specified today, has performance limitations which could easily be addressed in a future generation. You may have time-sensitive data that would require additional performance and low latency. There is no clear winner.”

Each has its own advantages. “OIF has derivatives of chiplets, which they call XSR—extra short reach,” adds Chugh. “That targets die to die or chip-to-chip interconnect within a package. So, the industry is progressing with standardized IP. I don’t think we have the best solutions today, because it is the first effort, but it is a move in the right direction. Standards are not always the best, but you have to take a first step.”

The upside of getting that right are significant reductions in time to market and lower development costs. “Our customers may at times be looking to combine our ASIC solution into an SiP with other components in a single package, and then there is the option for the reliability qualification to be covered in one package versus having to qualify all components separately,” said Olivia Slater, operations and logistics manager at Adesto Technologies. “Depending on what SiP is being developed, this could make the qualification and final testing solution less complex.”

Performance
Several organizations are trying to define these new interfaces, including a DARPA program called Common Heterogeneous Integration and IP Reuse Strategies (CHIPS). DARPA has defined the targeted performance space in figure 1.

Fig 1. Standard interfaces. Source: DARPA

Performance requirements are constrained by key physical elements. “When sending data, the two criteria people are looking for is power efficiency and bandwidth,” explains Chugh. “From the edge of the die, what is the maximum data that you can send without wasting area. This is the beachfront, which is on the edge of the die—how much data can I transfer per mm. Efficiency is the power aspect where people measure it as pJ/bit. How much power is consumed for sending each bit of data from one die to the other.”

“The number to keep track of is the figure of merit (FOM),” adds Vinnakota. “As soon as you take wires between die, you face a beachfront problem. You have to get to a chip edge to get these wires off the chip and you will burn pads. FOM is the linear density on the edge (1TB/mm) and then how many picoJoules it takes to move that data. So, the density/energy gives the FOM. This is the magic number.”

Multiple layers
Chiplet interfaces, like any other kind of interface tend to be multi-layered, with physical, link, transport and other layers, all designed to ensure robust communications. The ODSA has published a diagram showing some of the layers that may need to be considered in figure 2.

Fig 2. Interface stack for chiplets. Source: Open Domain-Specific Architecture group of the Open Compute Project

Physical layer
The physical layer can fundamentally be either parallel or serial. “The advantage of serial is that you typically end up with fewer wires but the cost is greater design complexity,” explains Vinnakota. “Parallel can typically operate at lower speed.”

But the choice is more complex than that. “The advantage to a parallel interface, such as AIB, is that it has extremely low latency, very low power and area—so it checks all of the boxes from an architectural point of view,” says Durdan. “The main disadvantage is that it does require a silicon interposer or some packaging technology like that and that adds a significant cost. The disadvantage of a serial interface is that for some applications, you cannot tolerate the latency associated with a SerDes.”

A SerDes, in this application, may well be simpler and faster than chip-to-chip solutions. “I see people trying to use SerDes for the connection, but which are much smaller, lower power implementations, taking advantage of the fact that you are only communicating over a very short channel,” says Durdan. “Those are multiple chips within the same package rather than across a board or backplane.”

AIB is a parallel interface that includes bunch-of-wires running at 1 or 2 GHz. “AIB has 2,000 wires and almost mandates the use of a silicon interposer or bridge,” adds Vinnakota. “If you are a small company, you may not be able to afford an interposer. Instead, you may want a product built on an organic substrate, which means you want a technology with fewer wires. The wire density possible with interposers is many times higher than wire densities of an organic substrate.”

Clocking is a primary difference between the two interface types. “With a parallel interface, you need to do things like clock forwarding,” says Chugh. “With a SerDes the clock and data are merged together. The parallelism of the data is maintained through the two devices and clock forwarding maintains the sanity of the clock drivers between the two devices. It makes it a modular design, where you can think hypothetically, ‘If you have a single die and there is a datapath on the die, you just cut the die into two pieces across the datapath.’ Now you have two chips and you are trying to stitch them back together in the same package and the parallel datapath is joined by this IP.”

There are other considerations beyond the datapath. “You need to think about things like integrated self-test, an integrated 1149 boundary scan mechanism to reach the die when it is buried inside a package, so it is not just about the data transfer across the interface,” warns Vinnakota.

Other issues also remain unclear. “There is some debate on the need for ESD protection in chiplets,” says John Ferguson, marketing director for Calibre DRC at Mentor, a Siemens Business. “Once you are at that point, they are going to be packaged and encapsulated, so there is no opportunity for a human body interaction. Some of them go away, but there are other electrical impacts that may become more problematic. It is hard to say. There have been consortiums to investigate them and most have come up with best practices.”

PCIe appears in the PHY ODSA list, shown in figure 2, because it is already supported by a large number of products. It is seen as a quick way to turn chips with a PCIe interface into chiplets without modification.

“Most chips in servers and higher-end equipment already have a PCIe interface,” says Kurt Shuler, vice president of marketing for Arteris IP. “Others don’t but many would prefer something lighter weight long term. As you add more plug-and-play capabilities, which comes with PCIe, you add complexity to the stack. So you go from a low-level interface to a robust hardware-software standard.”

Beyond the PHY
The emergence of PHY standards is not enough. This does not allow for a true separation of functionality. “There has been a lot of effort placed on the PHY layer used to bring chiplets together, but to make them work as a single product you need an architectural interface,” explains Vinnakota. “The ODSA wants to make that an open interface on top of an open PHY layer.”

If you have a collection of chiplets, you want them to work together as if they were a single chip. “The definition of work together should be to present some kind of semantic such the the software running on any one module thinks that all of the components are one logically integrated whole,” adds Vinnakota. “The interface between chiplets could follow I/O semantics or memory semantics. We think the right answer is memory semantics. At the top you have three types of memory movement. One is a coherent data movement. State is shared coherently across all processing elements. Second, you need non-coherent data movement because coherence is expensive especially as the area over which you want coherence grows larger. You either pay a price in terms of clock speed or a price in terms of latency to achieve coherence. Maybe you have a unified memory space, but it is left to the programmer to manage the non-coherent memory.”

Coherence adds software simplicity. “The idea is to be able to design these, assuming a CCIX connection, and it is going to be able to connect to any other chip that has a CCIX interface,” says Shuler. “There are still issues. The context of the overall system architecture and memory hierarchy within those two chips still has to be contemplated at the beginning when they are being designed. In the future, maybe that won’t be needed. But if you look at the spec, it is still a pretty low-level interface. There are some transactions going across, but you still have to make a lot of assumptions about how the other chiplet is working. There are different levels of CCIX connectivity that can be used, and as you get to the higher levels more of that is taken care of. The dream is for a CCIX connection on two chips. Hook them up physically in a die or on a board and it just works. That is not true today.”

Still, to be successful, this requires a system-level solution. “You will never get away from someone having to look at the complete system,” Shuler noted. “What you are trying to do is take multiple processing elements that each require access to memory, and you want a common view amongst them. It is not just about the connection level, but in the overall architecture of the chips. It may be that architectural guidelines have to be created—what needs to be connected to what within the chip for it to be compliant with this plug-and-play standard. Even from the software side, there may need to be some standards that explain how you are expected to communicate for these types of chips.”

Transferring information
In the short term, some of that information may have to be transferred the old way. “With IP today, you get information that specifies the timing, the power, and you will start to have a lot more when you talk about dies because they are on different processes and different metal stacks and thicknesses,” says Ferguson. “Somewhere, all of those details need to be defined so they know how to put them together.”

New models also may be required. “One thing that would be different is they would want a model and perhaps at different levels of abstraction for different parts of the other chiplets to combine with their own and be able to meet the performance requirements, deal with power and other aspects of the system,” says Shuler. “There is the flipping transistors standpoint, and then there is also the physical effects of the connections. There will be sharing of more than just a datasheet. Here is my pack of models and you may even need some of those for pre-sales.”

And IP vendors may have to develop new skills. “The pure IP players have struggled a little with this because they do not have the skills designing chips or packages,” says Durdan. “The biggest difference between a chip-to-chip interconnect and die-to-die interconnect versus other IP is that the packaging is such an integral piece of the solution.”

Conclusion
The chicken and egg problem is slowly getting resolved. Standards respond to a market, but the market will not develop without the necessary standards.

Proprietary interfaces are working out the wrinkles of how to connect chiplets and they are slowly moving into more open forums. Board-level standards also are providing a quick path, even though they ultimately may be expensive solutions.

It is impossible to know exactly what level of plug-and-play is right for this market. But even with these issues being unresolved, some companies will never go back to monolithic solutions.

Related Articles
Making Chip Packaging Simpler
The promise of advanced packaging is being able to integrate heterogeneous chips, but a lot of work is needed to make that happen.
Getting Down To Business On Chiplets
Consortiums seek ways to ensure interoperability of hardened IP as way of cutting costs, time-to-market, but it’s not going to be easy.
The Chiplet Race Begins
DARPA and a number of major vendors are backing this modular approach, but hurdles remain.
The Case For Chiplets (Video)
what’s behind the momentum for a LEGO-like approach, where the challenges are, and how the cost compares with other approaches.
Chiplets: Knowledge Center
Chiplet related stories, white papers, blog and videos
The Case For Chiplets
Emphasis on time to market and design costs is raising visibility for this approach.



Leave a Reply


(Note: This name will be displayed publicly)