The Zen Of Processor Design

AMD’s CTO talks about how to achieve more performance per watt and how chip architectures are changing.

popularity

Mark Papermaster, chief technology officer at Advanced Micro Devices, sat down with Semiconductor Engineering to discuss how to keep improving performance per watt, new packaging options, and the increasing focus on customization for specific tasks. What follows are excerpts of that conversation.

SE: As we get more into the IoT and we have to deal with more data, not to mention cars where data needs to be processed and moved very quickly, the focus seems to be shifting back to performance. Are people demanding power or performance, understanding they don’t want power to go up?

Papermaster: That’s one thing that is actually fundamental. It’s about performance per watt, performance at a given energy level. It affects everything from PCs and datacenters to IoT devices and phones. The faster you get a task done, the more performance you have. As soon as that task is done, you can return down to a zero state of energy dissipation. The more efficient processing you can implement into your design, the more you are improving your energy efficiency.

SE: But a general-purpose big processor is not necessary what you need everywhere. How does that change things?

Papermaster: For sure you have to tailor processing for your task. If you look at AMD’s lineup, we have a range of computing capabilities. Just in processors we have a range of low-power, energy-optimized cores—so they are going to have less area, typically be less expensive—all the way up to adding more CPUs and leveraging parallelism and more efficiency in the cores to tackle the more demanding tasks. All of them are designed to be energy-efficient. The amount of ‘oomph’ you put into that CPU core changes, depending on the application that you are targeting that processor.

SE: How does your architecture change as you go forward, given it is getting harder to stay on Moore’s Law?

Papermaster: hasn’t gone away. I call it ‘Moore’s Law Plus’. Moore’s Law was about doubling performance and keeping your cost and energy dissipation the same. It’s an economic statement. That economic demand from our customers remains. We are going about it two ways. First, it is in the design itself. You have to make the designs, from an architecture standpoint, more and more efficient going forward. We just designed a brand new CPU core, Zen, from the ground up. We actually started this effort in late 2012, so we’ve been working on it for four years. It takes four years to get a brand new x86, high-performance CPU done. We are right on track. It’s a very modern core and very efficient in terms of driving that performance per watt of energy, and it’s very scalable. We also designed it to work very well with accelerators, like our GPUs. You can add more CPUs if you need to get more work done, and you can connect to GPUs, FPGAs, or other accelerators.

SE: What’s different in Zen versus what you were doing in the past?

Papermaster: When we looked at Zen, we decided to make a change. We had a power-optimized set of processors for the low end. We had a very high-performance set of processors for the mid- and high-end ranges. In Zen, we wanted a new and modern core in every respect, meaning it can handle a range of workloads. It has high throughput, energy efficiency and floating point efficiency. It can scale from low-end applications to high-performance applications. That is done with both design and process. Design is microarchitecture, attacking every element of the execution units, of the cache subsystem, of the scheduling, every aspect to ensure you are removing bottlenecks. Technology is twofold. We’ve leveraged the new 14nm finFET technology. The scalability you have with finFETs is really quite a large range because it has very little leakage. When you turn off your clocks—when you are not doing active work—you can get very close to nil energy, and leakage is lower than previous technologies. Yet as you turn on your clocks and accelerate your workloads, you get very fast performance per watt.

SE: Let’s look at throughput and how you achieve that. How do you move data internally and externally at a higher rate of speed than what you were doing in the past?

Papermaster: With any microprocessor, it’s about designing a balanced machine. You have to look at the demand internally of all of your execution units. You have to look at the amount of bandwidth that you need and how you optimize bandwidth and latency. How big is your pipe feeding those engines? How fast can you move data in and out of those engines? That was the core principle behind the Zen CPU design. That extends outside as well as you interconnect to the rest of the world. It’s the same thing on memory and I/O. You need enough bandwidth and pipes to optimize your latency to ensure you don’t create bottlenecks.

SE: What else did you have to do?

Papermaster: We looked at what we could do to speed up both, ensuring no bottlenecks in terms of the execution flow. We’ve improved the micro-op cache, the efficiency of getting those instructions into the pipe. We’ve also made a number of efficiencies in terms of reducing the number of cycles executing though our execution units. In terms of memory and feeding it, we’ve optimized our cache subsystem. We stepped back and looked at where the workloads are.

SE: How do you reduce the number of cycles? Is that embedded software or external software?

Papermaster: No, it’s low-level. It’s our Zen designers rolling up their sleeves and bringing creativity to optimize how many clock ticks/cycles they can get an instruction completed in. It’s hardcore engineering of pipelining your microprocessor.

SE: What do you get for that in terms of performance?

Papermaster: We set a target of having a 40% instruction per clock improvement over the previous generation. We are shipping Excavator today, which is the previous core that we have in our AMD products. When Zen comes out in early 2017, it is going to have a 40% improvement. The only way you can get that is to use a combination of every aspect of the design, of feeding the engine, of optimizing the engine itself and improving the throughput to the engine. Those are the three key elements in terms of how you get improvements. Anyone who has been around microprocessors design for a while will say it is not rocket science. They’re right, but those are the levers. It’s about breaking it down into dozens and dozens of specific changes you drive into a design.

SE: So what’s changed on the software side?

Papermaster: We’re committed to open source software. You look at our microprocessor, we have an LLVM open source compiler to optimize the performance you get out of the CPU. When you look at accelerators, GPUs, we took our stack and put in open source. If you go to www.gpuopen.com you’ll see the software and the tools it takes to accelerate using our Radeon technology.

SE: Particularly in the GPU space, you’ve started employing some 2.5D packaging. Where does that fit into everything?

Papermaster: We rolled out our R9 Fury, which is our Radeon product with 2.5D technology. With R9 Fury and Fury X, we’ve brought memory in closer. Leveraging that packaging technology, we take stacks of high-bandwidth memory and bring it right on the same silicon carrier that the GPU chip resides on. That drastically reduces the time it takes to get at memory, to suck data in from memory and put it back in memory, and it saves tremendous energy. As you move that data around, it’s a very short connection over silicon rather than driving off that dGPU (discrete graphics processing unit) into a separate memory unit.

SE: Any plans to add that kind of architecture into the CPU side?

Papermaster: We see HBM having expanded applications in the future. The biggest driver of that is the HBM cost coming down. It works great today on the high end of our discrete graphics line, and as the cost comes down, you’ll see the applications periphery grow.

SE: Is the cost the memory, the interposer or where are you seeing the problem?

Papermaster: Both. Costs come down generation by generation over any technology as it matures. As the volumes go up, HBM costs will go down. The same holds for interposer. As the manufacturing volumes go up and the OSAT industry gains more expertise in the packaging techniques, costs will go down there, as well.

SE: What do you see as the evolution of advanced packaging?

Papermaster: Packaging and integration and how you put different solutions together will keep us on a Moore’s Law pace of performance increase, doubling every 18 to 24 months. It’s a fundamental enabler. It did start with 2.5D. Looking forward, you’ll see 3D integration, where you can stack more complex devices—active over active devices. You will see new types of organic packaging coming out, with very dense interconnects, allowing multi-chip connections at lower cost points. This will spur the ability to mix and match different CPUs, GPUs, accelerators, and different technology nodes. When you get that kind of heterogeneous implementation of engines with cost-effective integration, you’re going make big gains in performance efficiency.

SE: You also potentially can get to market much quicker with a customized solution than what you can do now, right?

Papermaster: Sure. When you have monolithic integration on a single die, each element that goes in that monolithic silicon has to be all created on that single development schedule, all optimized on that new piece of silicon. In a space like mobile, you have to do that to be able to hit the cost point and the massive scale. Think about smartphone, tablet, and low-end PC with high volume. My sense is that those will stay monolithic. But as you move up the value chain and you need more tailoring, it creates a lot of options for customers to create very optimized and innovative solutions.

SE: It’s really the world turned upside down as it used to be the high value solutions were the ones that were high volume. Now it seems like we are getting into higher value as we move into customized or semi-customized solutions.

Papermaster: There are new trends driving a new era of computing. On the compute side, there is the big data analytics, the raw number crunching that you needed in the mega data centers to be able to provide the businesses with the information they need. Then there’s the visualization side with virtual and augmented reality requiring incredible rendering to be able to create new environments and analyze data in different ways or create new markets. You’ll see whole new areas of applications of this immersive technology. Both the number crunching side, the analytics to be able to handle all this data, and then the visualization side all require high performance. They will require technology to be put together in different ways than it has to date.

SE: That needs flexibility, correct?

Papermaster: It does. Think back to the mobile phone era. We started out with a number of mobile phones, but it wasn’t quite taking off. You had initial smartphones and then there was an application world that was set up. Apple innovated with dropping applications on the iPhone and then others followed. It drove the explosion of applications. The same thing will happen with these new areas. It will start off targeted, but as the software and the applications start to grow, you’ll see the hardware matching those use cases. Gaming and entertainment may be one form factor. Medical may be another set of targeted form factors of the technology.

SE: What about security? How do you approach this?

Papermaster: We look at security in a very straightforward way. It is fundamental for users to adopt that technology. It is a must. Our technology has to have a very strong bedrock of security. We’ve approached this in a way that has leveraged our experience with game consoles. We’ve worked with the game console providers with semi-custom technology. Thinking about their business, you have to protect the titles. We’ve had to become very good at security. We have partnered with ARM and embedded TrustZone in every one of our microprocessors and graphic processors. It’s an ARM processor with a TrustZone implementation, with AMD elements around that architecture, our own cryptography and our own carefully designed controllability access technology. From the very moment you boot that engine and allow access to element on the chip design, it’s in a controlled secured environment with controlled access.

SE: ARM has moved from TrustZone to the chain of trust concept.

Papermaster: That’s an ecosystem beyond devices and AMD ties into any ecosystem. The trust and security that we build into our microprocessor is extensible from any consumer device, consumer or commercial PC, right up to server and network applications. We provide full security within any of our microprocessor elements and create an application interface that we can tie into the kind of other ecosystems that are out there. Again, it’s our commitment to partnering and open standards.

SE: We’ve been following that for quite a while, but no matter how good it gets, hackers still break in.

Papermaster: You can’t compromise when you encrypt data, so our philosophy is based upon a secure and authenticated access, and our customers deciding when they want to be in a secure environment. When they do, their data can be encrypted.

SE: Going back to scalability, how do you get there? Is the architecture scalable? Is it more chips that you are adding? More cores?

Papermaster: Scalability is about how you architect your design, from the inception of design. We designed Zen to be high performance and energy efficient. It’s in our test phases nearing our ship date at beginning of 2017. From there, we design in scalability at the outset. We have a long history of being able to scale microprocessor cores so we built on that history. We tuned up our hyper transport even further and have outstanding scalability as you add cores. As you look at how we connect to the rest of the world through I/O, we have a very robust I/O history across our designs. Scalability and connectivity are key elements of the system design.

SE: Do you solely use bulk CMOS, or are looking at some new materials?

Papermaster: These are bulk CMOS designs. We work closely with the foundries. When you look at their roadmaps in the long term, you’ll see bulk CMOS. You can see tinkering with the metallurgy and device structure, and you’ll see compound structures needed to continue device scalability. All that is coming. It is still largely a bulk CMOS approach, but you are going to see a lot of innovation.

SE: The slowdown of Moore’s Law seems to have triggered more creativity than ever. It’s not a question anymore of just shrinking. It’s now about turning all these knobs at once, right?

Papermaster: Yes. Gone are the days of mapping to the next technology node, knowing that you’ll stay ahead of the competitive curve. It is architecture and design, along with process technology, along with having innovative ability to connect heterogeneous solutions together.

SE: Does it get to the point that one architecture doesn’t fit all? With our moves to lots more vertical markets, like medical, the current architecture may not apply. Do you start doing multiple iterations that you didn’t have to do in the past?

Papermaster: There’s a limit in terms of innovative architecture—software. Time and time again, we’ve seen someone has a better way and whole new architecture to go out and solve a problem. The problem is that when you have that widget, there’s no software to run on it. It’s an immense task to get that software ecosystem in place. So, we’re leveraging that x86 ecosystem on our CPUs. It’s tried and true, with a massive installed base. Our view is that you really need to work closely with the software ecosystem to allow them to unlock the full potential of our architectures.

SE: That’s been one of the big problems is that software has been one step and too far removed. Hardware and software is developed separately and then they try to bridge the two of them. If they are worked together, you can have immense improvements in terms of performance and efficiency.

Papermaster: We looked right up front at the software ecosystem that we are targeting and we worked back from that to make sure we are bringing value to that ecosystem and partnering with that ecosystem.

SE: It’s partly hardware defining software, but also software defining hardware?

Papermaster: Absolutely. The old days that you can be off in a corner and devise a better solution without integrating with the ecosystem are gone.

SE: Are you working the Linaro side, as well?

Papermaster: We do. We are engaged with Linaro and we introduced the A1100 ARM processor, 8 core ARM device and we offer it in our produce mix today. Our view is that we welcome competition from an ISA standpoint. We are focused with Zen on returning high performance with X86, our heritage. But the reason we put out the A1100 and are watching that space is if ARM takes off, it is not hard for us to pivot and add that to our product portfolio as well. We are focused on X86 and we’re watching the ARM space, engaged with Linaro and other consortiums as well.

SE: What does the sale of ARM to SoftBank do to you and your relationship to ARM?

Papermaster: We don’t anticipate a change.

SE: When you look out, what worries you most about where you are going next and what is coming in the future?

Papermaster: I’m not worried about the future—I’m a battle-scarred veteran. I’ve been through many times where the pundits said the ‘end is near’. It was foretold the end of semiconductor scaling, yet we continue to see advancements in the semiconductor capability. You hear about the end of innovation in terms of compute engines. I see no end in sight in terms of driving innovation into our CPU and GPU engines.

Related Stories
CPU, GPU, Or FPGA?
Need a low-power device design? What type of processor should you choose?
Heterogeneous System Challenges Grow
How to make sure different kinds of processors will work in an SoC.
Heterogeneous Multi-Core Headaches
Using different processors in a system makes sense for power and performance, but it’s making cache coherency much more difficult.
How Many Cores? (Part 2)
Fan-outs and 2.5D will change how cores perform and how they are used; hybrid architectures evolve.
How Many Cores? (Part 1)
Design teams are rethinking the right number and kinds of cores, how big they need to be, and how they’re organized.



Leave a Reply


(Note: This name will be displayed publicly)