One On One: ARM CTO Mike Muller

A candid interview about the future of Moore’s Law and the impact of power and leakage on all future designs.


LPE: How far does Moore’s Law extend forward and what are we likely to encounter along the way?
Muller: The good news is there is no known solution for 7nm. That implies that between now and then it’s okay. When I talk to people they seem fairly confident they’re going to get there. Exactly how they don’t know. Will there be any miracles needed? Yes, probably one or two. But 14nm and down close to 7nm will happen. The bad news is that frequency will be flat with constant leakage.

LPE: That’s an interesting perspective.
Muller: Life is full of tradeoffs. People have traditionally taken different tradeoffs on process development. But in the past a lot of that was about getting frequency uplift. You can trade that in lots of different ways, and there is still frequency uplift to be had. But that costs you in terms of leakage, and people worry about that much more than they used to and where those tradeoffs are made.

LPE: Does that leakage continue even with the introduction of FinFETs and other techniques?
Muller: New process techniques like FinFETs help, but they’re one-off advances. You draw your curve, and there are times when you get ahead of the curve. Then you’re on the gradual slope back down again. So there are one-off things that really help with leakage. But once you’ve done that, you’ve still got three impossible things to do before breakfast to get you back down to 7nm. Those steps are part of the solution, but they don’t solve it to the point where leakage is going away and you don’t need to worry about it anymore.

LPE: How about dropping the voltage?
Muller: We’ve always done voltage scaling, and DVS (dynamic voltage scaling) continues. There will be different learnings about how much voltage scaling can you get. If you can do it, voltage is one of the best things you can do for saving dynamic and static power. That will continue, but the margins are getting harder to find.

LPE: ARM just introduced its big.LITTLE approach. What’s the thinking behind that?
Muller: The idea is that you can crank down the voltage and save power and scale it. There are times when you need performance, which is the ‘big’ part, and there are times when you don’t need that. You cannot build as efficient a microarchitecture for the big cores as you can for the little cores because getting that single-thread performance involves a lot of microarchitecture complexity and speculation, which ultimately costs you power. If you don’t need all of that performance, and your voltage scaling has run out of anywhere to go, the right thing to do is to task migrate onto an identical but smaller core with a simpler microarchitecture. That works wherever you are and on whichever process. It will always be true. You will be able to build much more efficient little cores than big cores.

LPE: How does this affect the overall device architecture?
Muller: This is an OS-level task migration, which happens anyway. You determine how many SMP cores you need to light up. Then you do task migrations. It’s another step to migrate onto a smaller core. That’s something you just build into the OS. You don’t need to add any extra magic. It’s already happening.

LPE: Is this going to apply in stacked die with rightsizing of functions?
Muller: The stacked die is almost an orthogonal issue. It’s happening today with flash and SoCs put into the same package because of packaging constraints. It opens the door to completely different die-to-die memory interfaces, which allow you to build more efficient systems than going off-chip, down-chip to a separately packaged die. It changes some of the memory bandwidth. But it’s just a computer at the end of the day, so main memory bandwidth is one of the fundamental determinants of performance. Stacking allows you to change that. Whether you’re stacking big cores, little cores, or big.LITTLE cores in combination, for different applications you’ll need different combinations. And you exploit that with main memory bandwidth.

LPE: It doesn’t sound like we’ve made much real progress in terms of true multiprocessing software for most jobs.
Muller: When I went to university, which admittedly was a few years ago, I was taught never to trust an MP solution from a hardware guy. That was one of the lectures from a guy who invented the sub-routine. I think he was right, but for low numbers of cores—eight and less—SMP is a fixed problem or a solved problem because you have enough system complexity that you do have a browser and a background task. You don’t have to worry too much about how well you’ve taken an application and threaded it.

LPE: But you’ve split the functions rather than threaded the application, right?
Muller: That’s the first step. And for two, three or four cores you can do that without really having to re-do anything. When you get into re-programming applications like your browser and executing that on multicore, there are a limited number of applications that drive that performance envelope. Your small applet you’re running doesn’t touch it. You’re going to do browsers and virtual reality apps where programmers are willing to go back and figure out how to re-program and rewrite it. It’s true that the general software community is not set up for generating multicore applications. For most applications, you don’t need it. Beyond that, there is database lookup that’s independent of any one application scaling.

LPE: So populating an SoC with small processors is a way of splitting off functions?
Muller: Yes, and heterogeneous isn’t just about big.LITTLE. It’s about having entire subsystems for tasks, which may be a Cortex-A5 running a complicated audio subsystem that might actually be for custom hardware. If you open up an SoC for a mobile phone you’ll find all of those things in there. The challenge is the programming model for that heterogeneous system, let alone programming the multicore apps processor with lots of cores in it.

LPE: And you need coherence across all of that, right?
Muller: Some of it is about system-level coherency, and some of that is in the programming model. There are three or four emerging standards for that. What they address is which computing where. You still come back to manual placement of the different processing elements for different tasks. That’s not a solved problem.

LPE: So as you look forward, is power and/or leakage the big issue?
Muller: If you go back to ARM 1990, we always talked about power/performance/area and the tradeoff between them. I don’t think that’s changed. If it’s all about power, run at a kilohertz, sub-threshold, and you come up with completely different solutions. If it’s only the Internet of things and tiny embedded microcontrollers, you still have to figure out what’s your budget, what’s your power and what’s your performance, and balance between them. In the future we won’t just worry about power.

LPE: But in the future will power become more important in the PPA equation?
Muller: It depends on who’s talking. We’ve always had power up there as a fundamental part of what we do. There is no sudden change of course. Power really matters in system-level integration, whether it’s megawatts in server farms or milliwatts of active power in a small SoC device. We’ve always worried about that. It’s just maturing for more systems, but it’s something we’ve always done.