Experts At The Table: Billion-Gate Design Challenges

First of three parts: What scales and what doesn’t; powerful tools and applications, but most of them need to be turned off; a mind-boggling array of options.


By Ed Sperling
Low-Power Engineering sat down to discuss billion-gate design challenges with Charles Janac, CEO of Arteris; Jack Browne, senior vice president of sales and marketing at Sonics; Kalar Rajendiran, senior director of marketing at eSilicon; Mark Throndson, director of product marketing at MIPS; and Mark Baker, senior director of business development at Magma. What follows are excerpts of that discussion.

LPE: What are the big issues we need to contend with in billion-gate designs?
Rajendiran: Billion-gate designs are no longer a fantasy. We can do that at 28nm with a 20 x 20 mm chip. But just to put this in perspective, when we first sent a man to the moon they had three computers. The power and the memory those three had together was less than we have in a phone today. So the question you have to ask is are your really putting that to good use? And from a business perspective, will it work when it comes out and who can help across the business value chain?
Baker: We’re approaching billion-gate designs in the GPU or microprocessor area. In the SoC area, we’re approaching about 100 million gates. In the next generation, we’ll see SoCs with quad cores. Beyond that, there will need to be some very significant changes in what kinds of applications we can apply those to and how we’re going to deal with the power aspects. These will most likely be in the mobile market and we’re going to have to deal with system-level issues like verification, battery life, and power. From an EDA perspective we’re on track for capacity and for some of the turnaround time, but power will need some of the focus.
Throndson: Process migration hasn’t continued to scale forward. We hit a performance wall years ago. Power hasn’t scaled, either, as we reached some of the smaller geometries. Area is the one piece that is scaling better, which enables these large numbers of gates. The keys here are systems integration and multicore processing horsepower.
Browne: When you look at design costs for billion-gate designs you have to look at the markets that are going to drive them. The mobile market has enough volume to handle the cost of these types of designs. It also has a lot of parallelism and concurrency because there is a lot of functionality, and there are a lot of different use scenarios. Traditional EDA is scaling so it can take advantage of this—traditional designs partitioned at a chip boundary in a way that fits well with the system architecture. That’s probably where 80% of us will see business opportunities. The other 20% is where you take a design and partition it across two chips. Their bigger challenge is on the tool and the architecture side and the ability of semiconductor and system companies to manage that level of complexity. When you scale to four or eight cores, there’s a huge amount of parallelism and on-chip memory. The issue we see is how you get that right, and today the solution is a lot of subsystem design. LTE radios are a good example. We’re going to replace GSM radios with LTE radios. They’re going to be 15mm of area and have a half-dozen DSP cores, but it’s going to be a standalone system that allows you to do verification, have a known good block, and which is characterized with the others. But you can’t do this as a billion gates at the top level.
Janac: What I have in my house isn’t a personal computer. My phone is a personal computer, and it will have everything I need in terms of data, family photos, passwords and payment systems. It’s more like a supercomputer and it’s going to be the driver for the billion-gate design. You’ll need storage and the computing power to make this a true PC. There are four criteria for this. The first is processing power. We’re going to have to go to many cores, so you’ll need cache coherency to utilize those cores from a programming perspective. Another key is integration. How do you bring these cities of silicon together, which is where the communication system for the SoC becomes critical? You also need partitioning. As you build more and more functions, those functions have different dynamics. The modem has to go through SoC evaluation, so it’s on an 18-to-24 month cycle, whereas the efficient digital SoC people are going to be on an annual cycle. You have to decide whether you’re going to put it on one die or multiple dies, whether you can stack the functions, and whether you can mix processes in the same dies. The partitioning and the support for the partitioning are going to have to be there. The last part involves the cost of the hardware and software. The hardware cost has been increasing slowly but the software has been increasing rapidly. So how can you use the hardware and the parameters in the hardware to lower the cost of embedded software, if not the operating system?

LPE: Will an increase in granularity in designs, in terms of various core sizes, wider I/O and multiple cores and processors, affect how we build these devices?
Janac: We’re going to have tremendous power, but we’re not going to be able to afford to keep it all on. When you’re doing graphics the GPU will be on and the rest of it needs to be shut off. For audio it will be the same. You need to be able to manage turning on and off of this functionality. And in terms of 3D silicon, some of the high-power parts of the chip such as RF and some of the modems probably need to be on a different die and connected through wide I/O and TSVs (through-silicon vias). These things will need very intelligent and capable power architectures. While you have more transistors you’re still dealing with the same power budgets.

LPE: Won’t it be even tighter budgets? In 3D stacks, the dies are actually thinner?
Browne: The terminals are better in those packages, though. Even though the dies are thinner there is a lot better coefficient with the bonding. But it’s still a problem.
Throndson: But the power source is not scaling with the demands.
Browne: We’re seeing designs today with a dozen to 100 power domains. Those are at 40nm. We have customers starting 14nm designs now. You’re going to have to move to abstractions. There are 1,000 voltage domains. Somebody will have to have a product that generates the HAL (hardware abstraction layer) of software. We generate RTL. Generating RTL and C code are not that different. That’s where you’re going to see a lot of growth in the supply chain.
Rajendiran: If you look at 130nm, we used to have one type of transistor. Now we have multiple types of transistors and different process flavors, which add a level of complexity. You now have a whole bunch of different libraries, depending on which type of transistor you use. That’s an opportunity and a challenge. How are you going to pick and choose your implementation? Then you throw in a billion transistors, and you’re talking about putting it into a single SoC. It’s going to cost a lot of money and you don’t even know if you’re taking the right path to optimize power, performance and the market. And most of it is driven by consumer markets where each person will use a device differently. What you put on the chip affects battery, performance and even leakage. There are great opportunities, but it’s also more complex. It comes down to who can you partner with for the software, for planning the product, and for implementing the chip in hardware. And it really needs to be tied together so you hit the product introduction times.