What Future Processors Will Look Like

AMD CTO Mark Papermaster talks about why heterogeneous architectures will be needed to achieve improvements in PPA.


Mark Papermaster, CTO at AMD, sat down with Semiconductor Engineering to talk about architectural changes that are required as the benefits of scaling decrease, including chiplets, new standards for heterogeneous integration, and different types of memory. What follows are excerpts of that conversation.

SE: What does a processor look like in five years? Is it a bunch of chips in a package? Is it a CPU and FPGA and GPU?

Papermaster: There is no question that the future of processing is heterogeneous. It’s multiple compute engines working in tandem, because massive data and graphics processing is needed everywhere. It’s needed in data centers and in PCs, and the explosion of data from the Internet of Things requires analysis and visualization across that whole food chain. There clearly is a requirement of domain-specific architectures. The CPU is fantastic for general processing, and there are a gazillion applications that run on x86 and on Arm. For more specialized graphics and vector processing, FPGAs or ASICs can provide very specialized computing. We saw this future at AMD well over a decade ago, and we pointed our R&D efforts toward this future. We’ve been in mass production for years with an APU that combines the CPU and GPU for our client and embedded markets, and now we’re bringing that APU into the data center. We’ve also led with chiplets. We are a leader in CPU performance, and we couldn’t have done that without chiplets, which were need to get that high of a core count or I/O connectivity. We had mixed nodes, where we leveraged both 7nm CPUs and 12nm I/O and memory dies. The future is heterogeneous, applying the best process node to a specific chiplet function. And in the future, it will be mixed materials. In an era of a slowed Moore’s Law, we can still provide a jump for power, performance, or area. It may not be all three of those at once, but you can focus on where you need it, and where it would not feasible to apply new technology over the entire circuitry that you need for a solution.

SE: We can control all these pieces, but you don’t want to go back and redesign everything all the time for every customer, right?

Papermaster: That’s right. You have to architect for the future. You have to think at the outset how you can partition functions to be put together in a heterogeneous way. That takes system engineering. It’s not just something that a chip architect can do on their own. They have to work all the way through the software and application side of a solution, and then through the silicon. It’s the process technology, the chip partitioning, and the packaging technology all together. The challenge is putting these solutions together in the future, and that really drives our deep, narrow skill domains to come together and to work in a design technology co-optimization more so than they ever have in the past.

SE: You’re not going to get enough power and performance benefits just moving to the next node to warrant the investment. So the challenge is being able to put the pieces together in a way that is simple enough — basically using a platform approach — and then partitioning and prioritizing for different applications. Is that correct?

Papermaster: Yes, that’s right. You have to architect for that solution. A good example of that is V-Cache, the vertical cache we introduced where you have a cache right on top of the CPU chip with hybrid bonding. It’s just a pure metal connection, and it’s a huge advantage if you’re looking at a cache-bound application, like many of the analytics EDA uses. In the verification of your RTL design, we’re seeing that when you run that application you get a significant jump. So you have to architect solutions. You have to know what problems you’re solving. And then, as you do that and you know the technique you have will really bring value, then you’ve really got to think through all the other moving pieces. What challenges is that going to entail? Do I need new EDA capabilities to model it? Do I need new simulation capabilities? What mechanical or thermal issues do we need to solve? You’ve got to back up in your design process to build test chips, and analyze all of that before you can dare bring that first product out. You can’t just hope it’s going to work. You have to know it’s going to work. It really takes a well-thought-out plan and strategy to be developed over a number of years before you bring a solution to market.

SE: Is there a market for commercial chiplets? Or will they still be developed by AMD and Intel and Marvell for their own internal consumption?

Papermaster: Today, it is internally developed chiplets. We’ve been shipping chiplets for years. And that’s always how new innovations like this start, because standards take time. We took the investment we had in the Infinity architecture, which allowed us to connect our CPUs or GPUs, and now FPGAs with the acquisition of Xilinx. First, you need a way to put these disparate IPs together so they work harmoniously. That can scale into an ecosystem system over time, but it requires standards. If you look at what we’ve done the last few years, we’re making great progress as an industry on the standards that will indeed enable a chiplet industry. We started with an efficient protocol. In its initial implementation, it rides on top of the physical connectivity of PCIe. But the CXL protocol is really going to enable an ecosystem of solutions with accelerators and expanded memory options. Imagine that transported into a chiplet ecosystem. That’s the intent of the UCIe consortium.

SE: How do you see packaging transforming to enable this? We’ve got a lot of different options on the table — fan-outs, 2.5D and 3D-ICs, as well as hybrid bonding.

Papermaster: There are multiple packaging approaches because there are different problems that need to be solved. Each one is tailored for economic and physical constraints, such as the number of I/O connections. But there will be some consolidation over time. We’re starting to see the cost go down as some of these packaging approaches get to higher volume, which makes them economically viable. And then you can build off that. Wafer-level fan-out is a great example. As it gets to high volume in some of the consumer products, we can add additional connectivity layers and build upon that. We can bring it higher in the chain of complexity and solve more problems, while leveraging derivatives of that wafer-level fan-out approach. But it won’t be just a one-package solution. There is a continued need for elevated fan-out bridges as well as hybrid bonding. Each of these elements will continue into the future, and there will be more innovation. It’s going to be a dynamic space where we can get alignment and where we can get volume behind a given technological approach to bring the cost down. And when you bring the cost down, you bring in more and more products that can utilize that approach, and it becomes more of a mass market solution.

SE: There are a lot of developments in three dimensions. Bridges are one example, but there also is a lot happening in RDL, and on the underside of the chip with backside power delivery. Is all of that realistic?

Papermaster: It is realistic, but it adds challenges for the industry. If you want more redistribution layers on your substrate, you’re taking some of the skill sets that traditionally were inside the foundry and moving them out. That needs to be more broadly applicable for OSATs. We’re seeing more of an overlap between what’s traditionally more of a hard line between silicon manufacturing and the packaging technologies. There will be some crossover, and it’s going to give us yet more options for how we put these solutions together to meet the ever-escalating requirements that consumers have for computing.

SE: People have been talking about the decline or death of Moore’s Law since pre-1 micron, and we keep blowing past it. But the economic advantages are dwindling. How far do you see scaling continuing, and will it be the same kinds of chips, or will we see a 2nm logic chiplet inside a heterogeneous package?

Papermaster: It is indeed getting harder. That means the cost is really going up for you to create those highest-performing devices. Going forward, you have to be even more selective of where you’re applying the new technology. It used to be that we wanted PPA all at the same time — power, performance, and area benefits, and lower cost at every new node. Now, we have to be more selective. You may not get as much of a power benefit, or the same maximum frequency benefit of a transistor that you used to have. But you have to look at the whole equation and where you get a significant economic benefit for your product. You still want to jump on those benefits even though it’s going to cost more, so you want to architect it so the circuitry you use gets the maximum benefit from that new technology. We will need more heterogeneity, more chiplets, and more advanced packaging to put all these pieces together.

SE: And also a little bit more right-sizing? A lot of the feature shrinks were done to create more real estate on a chip. Now you can you can pack them different ways using approaches like hybrid bonding.

Papermaster: That’s right. If you give the architects a more facile way to solve a problem, they will find it. They’re incredibly bright. We are constantly re-evaluating how we can implement the approach that we’re taking to increase performance. A great example of that is AI, which is probably the fastest growing workload out there — not just in the data center, but at the edge and for endpoint client devices. We’re seeing enormous innovation going on in this area. There are a number of startups out there. We’ve got a number of efforts that we’ll talk about in the future to speed up AI applications on our devices, co-designing across hardware, software, and application space.

SE: How about memories? We’re hearing about a number of new memories in development. How do you see that playing out?

Papermaster: There are several innovative approaches for new memory device structures that are either standalone or changes to the process in any given monolithic chip. There’s a lot of very good work going on for processing in memory, and we’re very involved in that with partners in the industry. But it goes well, beyond that. There are new memory structures that are being being evaluated, as well. They take time. You can do architectural changes pretty quickly, but when you have new materials for advanced memories, that takes a few more years. It’s an area ripe for innovation, because all of these compute-hungry applications that we need are equally hungry for memory. And that memory needs to work at lower power, and it needs to be at latency that is acceptable for the compute elements they’re supporting.

SE: AMD has made a couple of new acquisitions, Xilinx and Pensando. What’s behind those acquisitions? And what’s different, because AMD has done very few acquisitions historically.

Papermaster: We primarily have focused on organic growth, and it served us extremely well. Xilinx was the right partnership at the right time. We’ve worked closely with Xilinx for years. You can see many examples where Xilinx FPGAs or adaptive compute is connected to an EPYC server and provides great benefit to end customers. But it is the very trends in the industry that we’re discussing that drove us to come together. We completed the Xilinx acquisition in February. The cultures of the companies are very much the same. They’re both focused on innovation, on solving problems holistically across the foundry technology, the packaging technology, along with architecture — all the way through the end application solution. The ability to further integrate the very complementary portfolio that Xilinx brings to the AMD CPU-and GPU-focused portfolio really will allow us to meet the customer needs of that tighter integration for the heterogeneous future that we have been so focused on at AMD. Pensando adds to that. We have the start of a smartNIC portfolio with Xilinx, but it was a focused portfolio. That gets complemented by Pensando, which has a highly programmable solution, leveraging P4 libraries that really enable customers to tailor it to micro-services. That combination of Pensado, which already has demonstrated a number of successful micro-services solutions, along with the hardware programmability of the Xilinx portfolio, is a tremendous combination.

Fig. 1: AMD’s 3D stacked die V-Cache for its EPYC platform. Source: AMD

SE: That also allows chips to stay in the market longer, right? You can update them, adapt to any changes in software or protocols or security — even noise and various physical effects.

Papermaster: Yes, but this is not new. This is the ‘ever’ in the ‘never-ending journey,’ refining the tools and approaches that we have. We have adaptive techniques for power management, including fine-grain distribution of in-die process temperature and voltage monitoring. Our controller for that is polling thousands of times per second, comparing those readings to what we had at manufacturing test. That that will grow in the future, because as your margins shrink, as your voltage is reduced at every new node, the impact of defects gets higher and you get quite subtle defects. Those are very difficult to test for. You really have to adapt in real time the operating point to get at these dynamic variations, and to enable a robust operation to the manufacturing specs, despite noise and variation. You have to margin it in, and you have to detect it real time and adapt.

SE: This used to be built into the manufacturing process, but it’s being kicked much further left in the flow, right?

Papermaster: Yes, and you have to architect and really innovate to ensure that quality goes up, not down. You have to improve the quality at every generation. It’s no longer just PPA. It’s power, performance, area, and quality. You have to architect for that at the very inception of a new design, along with security and cost and all the other pieces.

Customization, Heterogenous Integration, And Brute Force Verification
Why new approaches are required to design complex multi-chip systems.
How To Optimize A Processor
There are at least three architectural layers to processor design, each of which plays a significant role.
Who Does Processor Validation?
Heterogeneous designs and AI/ML processing expose the limitations of existing methodologies and tools.
Security Risks Widen With Commercial Chiplets
Choosing components from a multi-vendor menu holds huge promise for reducing costs and time-to-market, but it’s not as simple as it sounds.

Leave a Reply

(Note: This name will be displayed publicly)