Experts At The Table: Billion-Gate Design Challenges

Second of three parts: What scales and what doesn’t; powerful tools and applications, but most of them need to be turned off; a mind-boggling array of options.

popularity

By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: Will anyone be able to afford to create these complex chips in the future?
Janac: Sure, but it will be extremely expensive.
Browne: Apple is doing it. They’ve come at it with a systems approach. The user will have a great experience because they’re going to add a whole bunch of devices. But we’ve got to find ways to attach to the software at a higher level. We’re doing a full system design. We’re not hooking up a couple of widgets anymore.
Baker: Apple has moved up the stack. From an EDA standpoint we see all these challenges. We’re actively seeing designs at 28nm, planning for 20nm. We’ve yet to see designs at 14nm. But the complexity of validating one of these devices, whether it’s a single die or a multiple-die approach and in the future 3D, is increasing by orders of magnitude.
Browne: With 100 times the number of elements you can’t just extend the methodologies we use today. You have to define the interactions so you can abstract this. You can’t manage this many power domains when the use models are different for all the users. There may be 200 things you’re turning on and off to reduce leakage and increase battery life. To date, most people haven’t done that. In the rush to get to production people want to know if it runs Android or Angry Birds, not whether you’ve done all the power management stuff up front. We’re back to the speed of execution in getting it almost right and being early.
Rajendiran: That’s correct. Verizon, after years of rumors, finally launched the iPhone. But as they got near to release they said it cannot do multitasking. Who was asleep at the wheel? Then the next day they had a software fix to enable that. Why didn’t they think about it ahead of time? With all these complications we should really partition who does what.
Browne: Yes, it’s a system problem.
Rajendiran: But it’s something people could have easily thought out ahead of time. We need to define the components that need to be addressed and give it to the people who can address it. If you take a processor and optimize it for a set of libraries vs. another set of libraries, for the same performance level, one might take a third of the power of the other one. But who should tell you that? Should it be the company that makes the processor or the company that builds the SoC?

LPE: But increasingly you’re not building the chip. You’re integrating parts.
Throndson: You can see people racing ahead of each other, depending on the pieces you’re considering. Part of it is just a matter of getting to market early with a solution. But in terms of parallel hardware, it’s still way out in front of parallel software. Even with power part of the answer is going back to better utilize the hardware that’s already there, whether it’s the processor itself or at the larger system level. It’s very difficult to optimize and deliver every component that goes into these systems today.

LPE: From the network-on-chip perspective, will these chips be running at the same node and power or will there be an array of nodes, power and legacy technologies.
Janac: You’re going to be dealing with multiple processes and legacy applications. It doesn’t make sense to put analog IP on a 16nm design. You will have to use multiple die using a system-in-package approach where the digital part of the system is running at the latest nodes optimized for low power and cost and the analog stuff is running on trailing-edge processes where the IP is available.
Browne: We’re building a system using building blocks, and good enough wins if it’s early enough. The more you re-use, theoretically, the quicker you can get there. But the real challenge is how you better enable mix and match in the software area.

LPE: And that ‘good enough’ is also tested well enough?
Browne: Good enough has programmability. The fabric allows reprogramming. We think it’s important to be able to do things in parallel. If you can get enough of them done simultaneously, even if they’re running slower, then you don’t need buffers to manage those serial events and you have less logic and less wires and slower transistors in the linear area of design. That also means there is less leakage.

LPE: Will the tools be able to deal with this kind of structure?
Baker: Re-use has been around for about 15 years. So what’s preventing the re-use? A lot of that scaling and functionality is available today. It’s not a new challenge. The challenge we face is that re-use isn’t happening. We’re redesigning these components with each iteration.
Janac: Once you get past RTL the tools are horizontal. The chain of synthesis, place and route, verification and DFM are applicable to that entire system. Above RTL it’s like the silos of IP. Those tools are not addressing that. The MIPS and ARM processors each have their own tools. Arteris’ NoC has its own tools. You wind up with horizontal silos where the IPs are tied to the tools. Only when they reach RTL do they hit the Magma, Mentor, Synospys and Cadence tools. There is no horizontal toolset that can handle all of these IPs at the architectural level.
Rajendiran: There’s no reason to keep up with Moore’s Law for things that have already been certified and verified. In the old days we were following it. When Moore came up with that law he wasn’t talking about cost. He was talking about transistors. At that time you could do a chip for $50,000. That’s not the case anymore. People are slowly coming to the realization that if you have a chip working, why bother re-doing all of it? You can put software on it, you can even re-do it on the latest process, and use an interposer to make it work. So 90% of the chip is already validated. You add new software and you get the chip out sooner.
Browne: You also cover more markets, which adds more complexity to the definition. The requirements are different for a smart phone and a tablet computer.

LPE: But some of the functionality may be the same between a smart phone and a set-top box, right?
Browne: Yes, and that’s why the big companies have more data points. They know which subsystems can be re-used. When you’re doing audio on these devices everything works. When you add more cores or video, it’s different. The guys with a bunch of technology in-house just need to add more things out of what they already have.

LPE: How many of these billion-gate designs will be on 2D structures vs. 2.5D or 3D?
Rajendiran: With 3D, the problem is more on the manufacturing side. When you drill a hole there are problems. It’s just a matter of time before full 3D works.
Browne: The fabless community is huge. There are $3 billion fabless companies that have very expensive product portfolios. There are also startups that build similar point devices to try to go after those markets. The difference is the big guys get to run more experiments. The little guy only has one.
Janac: The answer depends on what you’re trying to do. If you’re building a unified chip that fulfills a unique function, throwing it on 16nm process makes sense. If you’re mixing functions that are mixed signal, analog, RF or legacy it makes sense to put it on more die. But fundamentally the mixed-die approach is more expensive than trying to put it all on a single die in 2D, assuming you can use one process and the IP is all packaged correctly.

LPE: How many derivative chips do you need to get these days to make it economically feasible?
Browne: At 28nm the cost is about $80 million. How are you going to get that back?
Janac: People who make wireless chips are spinning them off into automotive and home gateways, so you wind up with seven to 10 derivatives for a successful platform.
Browne: In some cases a subsystem is re-used, in others it’s the same chip.