What Comes After HBM For Chiplets

The standard for high-bandwidth memory limits design freedom at many levels, but that is required for interoperability. What freedoms can be taken from other functions to make chiplets possible?

popularity

Experts At The Table: Semiconductor Engineering sat down to discuss what will trigger the creation of a commercial chiplet marketplace, and what those chiplet-based designs will look like, with Elad Alon, CEO of Blue Cheetah; Mark Kuemerle, vice president of technology at Marvell; Kevin Yee, senior director of IP and ecosystem marketing at Samsung; Sailesh Kumar, CEO of Baya Systems; and Tanuja Rao, executive director, business development at Synopsys. What follows are excerpts of that discussion, which was held at the Design Automation Conference. To view part one of this discussion, click here.

SE: HBM is the posterchild that shows it is possible to define chiplet sockets. How can we leverage what we have learned for defining other sockets?

Alon: HBM is a good example. JEDEC knows something about its application. They are incentivized to do it. When we talk about using HBM, nobody talks about the spec for the PHY. It’s a thing, it’s there, it’s necessary, but nobody talks about that. You talk about interoperability with HBM. Forget about the fact that there is an interconnect between chiplets. Just ask, ‘What is the chiplet? How does it interface into other things?’

Yee: We are being naive. The reason they don’t talk about it for HBM is because it’s defined, and they know exactly what to do. Right now, it hasn’t been defined for any other chiplet. And the problem, from the foundry perspective, is I want standardization. I can’t afford to have 10 different versions of UCIe for every single customer. That’s not how you grow the industry either. If you’re an Intel or an AMD, you’re going to customize, you’re going to build your own. But they can afford to do it. The next startup that could be the next big thing probably can’t afford to do it. And part of my job is to enable companies. Unfortunately, 70% of companies that I talked to, that are trying to do chiplets, have no idea how to go about it. Part of our job is to educate and enable them.

Rao: Yes, there is so much confusion right now.

Yee: Orientation, bump pitch, can I do this? Do I use 2D versus 2.5D versus something else?

Alon: The trick is that when we say customization, it doesn’t mean that you’re throwing out the baby with the bathwater.

Rao: It has to get there. It is like crawl, walk, run. Crawling has been done by Intel and AMD. Smaller companies have to do the walking before we can start running with this.

Alon: The end state that we need, largely for manufacturing costs reasons, is that there will be a set of parameters that will have to be customized. What is the package? What is the bump pitch? There won’t be an infinite number of those. That space will be fairly concrete and defined. It’s just that we will not have the problem we have today of saying, ‘Everybody’s doing something completely different, so I don’t even know what chiplet to design.’ This includes the substrates.

Rao: In the 2D world it was die design. When you come to 2.5D, 3D, it is die-plus-package design.  Package design needs to be a part of the discussion from Day 1. The challenges of die-to-die interface and package design are growing and changing rapidly. How is a customer supposed to know everything?

Yee: The first part of a customer’s education is teaching them that the days of doing your chip and tossing it over the fence to packaging are done. It’s destined to fail. When you’re doing chiplets, it is not just about the chip. It is the system level that you have to be concerned with. And once you understand that, all those other implications come into play.

Kumar: So many stars need to align — electrical, physical, supplier packaging. That is very dangerous, if you think about it. For the IP ecosystem, only one star had to align, which is the protocol interface. Even then, it took a while, and big companies have not been using the IP ecosystem.

Kuemerle: The point about HBM is very valid. We have something that’s defined. Multiple people actually can make a product, and make money off of it. It all fits in the same form factor, for the most part. We can define something, and you can interchangeably use components because of the standardization. That is the only way to go. It’s the only way you could pull chiplets together and make them work. If you enforce something, you don’t have to build customizations and shim layers everywhere. But the one question in my mind is, how do we get to the next point and pick what the next thing is? Can we actually make that thing useful when we put all the limitations of standardization on it?

Alon: One avenue is Google, or Meta, or Microsoft. You need somebody that has enough dollars behind a statement of the following nature: ‘I want chiplet X, go and build it.’

Kuemerle: And it must look like ‘this.’ What you’re saying is developed something like a JEDEC standard for HBM, but in a forum like OCP, or wherever.

Yee: Let me play devil’s advocate. If I am Company A, and I have enough money, why would I want to open it up to everyone else? Because today, we already have that. Intel has it, AMD has it, and they have not opened it up to everyone else.

Alon: To an extent, there’s a long-term industry benefit for having these types of things. Who’s going to be the first to say, I need this thing but I’m not going to build on my own, and put the checkbook behind it. This is an outstanding question for exactly the reason you said.

Yee: I would argue that yes, they’re all involved.

Rao: But the industry is going to evolve. Why should hyperscalers build a common chiplet when they can go to semiconductor design or IP companies to do this for them?

Yee: I agree with you completely, but someone has to be the first to put out that money. Why am I going to put out the money so that you don’t have to do it the second time? I don’t want to be first.

Alon: There is a second path, and which one will succeed is not clear, but there are many startups that are trying to push in this direction. What they have all quantized to is that these sockets don’t exist. ‘I’m going to define my own sockets. I’m going to go build my stuff. Once I’m successful, I will say, people will go build this stuff.’

Rao: Expecting that from a startup is very difficult.

Alon: Everything about this is extremely difficult.

Rao: A large company can pull it off.

Kumar: It is not one startup. It’s a combination of many startups. Nvidia, Google, Microsoft, they can have a closed ecosystem because they can afford it. But the industry has to come together. IP vendors, PHY vendors, fabric vendors — all of them have to come together. CAD vendors, as well.

Rao: What is in it for me? Why would I open up my stuff?

Yee: I understand, from the company’s point of view, why they wouldn’t want to do it. From a foundry perspective, we will want to enable it. I want standardization because I want to enable as many customers as possible to build as many wafers as possible. And part of that is working with all of you guys in the ecosystem. The burden is not all on us, but we work with all of you. If I enable more customers, it enables more customers for all of you.

Rao: It has to be a foundry-driven, memory company-driven, ASIC vendor-driven.

SE: There is a large community working on RISC-V these days. Would something like a RISC-V expansion interface for putting an accelerator on RISC-V be suitable for a chiplet socket?

Yee: For the socket you just described, there are already three or four companies that do exactly the same thing in chiplets form. They are RISC-V based, they want to connect to an accelerator, but they’re only building their part of it. They need an I/O chiplet, they need an interface, whatever. Those sockets already exist.

Kuemerle: If we all agreed with the form factor and power delivery, wow, wouldn’t that be a great thing?

Yee: That’s the problem. They always feel like they have to do something different to differentiate themselves.

Kumar: It’s in their interest to have a common interface. The more common interface they have, the more widely their product can be used. And compute is moving in that direction.

Alon: This is the tension. On the one hand you want to say, ‘I built the chiplet that is going to work for everybody and everything.’ That sounds great from a marketing and development perspective. But then you go to the engineering people and say, ‘Now, let’s talk about what I am actually building because I don’t know what that is.’

Yee: It’s not that hard. Those guys doing it, they’ve already standardized. They don’t care if they use BoW or UCIe. It’s already been decided from that point, because that’s what exists today — most of them, anyway. But then the question is, ‘What is the protocol? I can’t use this protocol, because if I’m talking to you, who is building me an I/O die? And you, who is building me a memory die? But you did it with Synopsys and you did it with Cadence and you did it with Blue Cheetah. How do I make sure I have a protocol that’s going to talk – electrically?’

Kumar: The compute example is probably not the best example. If you look at computer architectures, we are getting to an intelligent computer world where people are building CPU compute, GPU compute, and neural compute. We have these types of compute. And the way compute is scaling is not by making CPUs bigger, but by having more CPUs. If you look at the evolution of all computer architectures, people are building larger and larger compute dies, with more and more CPUs. GPU is exactly the same thing. They are just adding more and more threads. That’s how they’re scaling. NPU has the same trajectory. Compute, is not the right example, because compute is a very standard socket already. The challenge is that when you look at the protocol aspects, most of them are using CHI. CHI is a standard for GPU, as well as CPU, as well as MPU. The compute socket has become fairly standard, and it’s only a matter of time before Intel and AMD can’t keep up and they will adopt CHI. The socket is there. The problem is that when you build the chassis socket, where you have PCI flows, we have CXL flows. You have multiple levels of caches. Defining the protocol at the boundary is not at all clear. For example, if you are building an architecture where your memory is segregated, you have a cache die, you have an audio die where you have PCI, CXL merging. How do you define that interface? That is where the challenge is. That’s exactly the NoC opportunity.

Alon: The other piece of this is that there are all these annoying physical details. It is CHI, but am I taking four of those streams? Am I taking two of them? How many of those go into one die-to-die?

Kuemerle: And where are they?

Alon: Exactly.

Kuemerle: Right now most people are defining chiplet-based systems in a way that, unless you’re connected with some kind of more flexible interconnect, they’re aligned to the micron. Every piece that goes together in these systems, and it can have 12 different pieces, is aligned to exactly a micron. HBM is aligned to the micron. And if we defined a standard CHI interface at a regular periodicity, that is pretty interesting.

Kumar: The way to solve this problem is with a fabric that can bridge the gap. If you have a 64-bit CPU talking to 128-bit memory, how do you bridge the gap? The fabric takes care of that. This is exactly the right solution. The electrical problem is something you guys have to solve. This is black magic for us. You have to make sure that electrically, and at the link level, things are compatible. But beyond that, there has to be a configurable, flexible fabric, that takes care of all the irregularities and incompatibilities.

Yee: I would settle for a baseline standard that we know works. We’re trying to solve all the problems. Give me one that has a standard, choose AXI, choose CHI, that works. It might not be the best for every application, but it’s going to work. Once we have that, then people can start tweaking it.

Alon: We are there today. It’s not that we can’t build interfaces that work. It’s that the end customers are so confused about what to build that we will spend X amount of our time, where X is a large fraction of the total, just working with them to figure out what the heck are you even doing.

Yee: I’m talking about a standardized baseline on the protocol side. What Synopsys is doing, what Cadence is doing, what Alphawave is doing — these are all different. But if you guys were all doing a protocol layer that was similar…

Rao: We don’t have to. Synopsys has UCIe IP customers. They license our IP, they make their chiplets with our IP, so those chiplets work together. Next would be they go and partner with a third-party company and say, can you build this chiplet for me with Synopsys UCIe IP? It will work.

Kuemerle: We are simplifying the term ‘work together.’ They have been in an enclosed ecosystem.

Rao: It is not like a one company ecosystem. It’s multi company ecosystem.

Kuemerle: With a defined protocol layer.

Alon: In a sense, that’s the point. We all know things that are being done today, that all look exactly like this. What works today? Well, we just defined very specifically Synopsys UCIe on both sides. It was a defined specific thing. It worked for a particular product family and a product set, and that’s how we actually could engineer and execute something.

Kuemerle: Can you bring that definition to three different products in any given space? If you do that, then it becomes like HBM. I know that when I need an HBM, here’s how big it is. Here’s how much power I need to supply. They’re not exciting things, but they’re very practical. And when I know them, I can start putting the puzzle pieces together, but without them it is a free-for-all.

Rao: It will happen in the next 5 to 10 years.

Yee: We have to take away some of the flexibility that we’ve given everyone.

Kuemerle: Yes. Now you can do anything.

Yee: And if you let them do anything, they will do anything.



Leave a Reply


(Note: This name will be displayed publicly)