As the industry speculates about 5nm and below, questions surrounding node shrinks remain.
Does shrinking devices still make sense from a cost and performance perspective? The answer isn’t so simple anymore.
Still, the discussion as to whether semiconductors are still on track with Moore’s Law occurs on a frequent enough basis to continue analyzing at least some of the dynamics at play. There is much speculation about what happens after 7nm, as well as ways to continue innovating in spite of Moore’s Law or Dennard scaling. Today, at least, there are still options at the device and materials levels to allow for continued power, performance and area improvements.
The debate about Moore’s Law even stretches into the mainstream media, said Greg Yeric, an ARM fellow, at last week’s TechCon conference. There are some who believe 28nm is the best node ever, and everything is done after that. There are others who believe that everything is fine from companies like Intel, and the foundries.
“In reality, 28nm wasn’t the best node ever,” Yeric said. “We’re continuing to scale costs. We have a good cost story. But there are some things going on under the hood.”
This is particularly true for standard cells. “If I have a cell ‘A,’ and I want to scale it to the next node, ‘B,’ I would scale it by 0.7X,” Yeric said. “That’s my Moore’s Law node. Unfortunately, we’re also in this era where we have to do increasing amounts of multiple patterning on the wafers to get to these pitches, and that’s expensive.”
Pitches are a critical piece of this equation, particularly for the fins in a finFET. While scaling may make sense on paper, the manufacturability of these devices isn’t always so simple, particularly beyond 16/14nm. Backing off the pitch makes it easier to design and manufacture, but it also can impact performance and power.
“In the finFET regime, with the first, second, and third generation, that’s exactly what we’ve been doing, Yeric said. “If you watch the pitches, they are not scaling like you think they would, but in the standard cells, we’ve gone from larger numbers of tracks in the cells to fewer, and fewer. Commensurately, we’ve gone from 4 fins, to 3 fins, to 2 fins in these cells. We need the transistors themselves to kick it into overdrive to get there but that’s your story for second- and third-generation finFETs. You don’t have to be an expert standard cell designer to realize that we can’t keep doing this for very long. One fin would make a lot of circuit designers’ heads hurt. Zero fins probably won’t work, except for power.”
These kinds of discussions happen on a regular basis, noted Mike Thompson, senior manager of product marketing for the DesignWare ARC Processors at Synopsys. “We have customers we are working with today that are building chips that, three or four years ago, would have been half a dozen to 10 different chips. They are integrating the whole thing now. That’s largely due to the movement in process technology. Everybody thought the process curve was going to end at 16nm a few years ago. Now of course we are starting to see implementation at 10, and there is work at 7nm. When you look at that, the size has gotten very small, but the power consumption also has dropped dramatically. If you can put all that stuff onto a chip, the levels of integration are breathtaking.”
But is all of that integration really necessary for all applications? He explained that for flash memory, there used to be one or maybe two processors. Today it’s not unusual to see 8, 12, or 16 processors.
“For the semiconductor industry, the natural tendency has always been integration, and that’s exactly what we are seeing, especially when you look at something like the enterprise end of the SSD space where SSDs on steroids,” Thompson said. “But you do make more money by not integrating, especially when you look at the cost of the more advanced process nodes, which was why people were talking about the process technology running out at 16nm.”
This is one of the reasons people say Moore’s Law has ended. It’s getting too difficult and too expensive for most of them to push to 10nm and 7nm.
Working at older nodes
“There are people still running stuff on 90nm, 150nm, 180nm,” Thompson said. “The bulk of the processing is still done today at 28 and 40nm, so it’s interesting that the cost goes up dramatically. There wasn’t necessarily a need on all chips to move forward to the smaller process nodes. When you look at the level of integration that’s happening at those nodes, what’s really happening is that you are taking whole systems and crushing them down into a single chip. Then, when you start doing the verification work on that, it’s huge.”
Still, for as long as it makes sense economically, engineering managers would like to go down the road of increasing performance, increasing integration, increasing bandwidth — all of which may, in some ears or in some circles, sound the same, noted Patrick Soheili, vice president, product management and corporate development at eSilicon.
However, there are some significant challenges that create stumbling blocks. One of these is the economics, he said. “The price they pay per transistor/per gate/per function is not dropping the way it used to. Some argue that it’s actually going up. There is a price to pay on the pure economics of it.”
Along with this, integration is an issue, but not at the cost of power. Performance is an issue, but not at the cost of power. “Right now, with all the deployments and the massive number of servers that have to get deployed in a typical data center, in a typical cloud application, it’s all about managing the power envelope. This is not an easy issue if you were to stay in the same. If you stayed in 28nm, or 14/16nm, or the first finFETs, and you wanted to at least make slight improvements in the integration and/or the economics/footprint/performance/bandwidth, you will have to assume that your power is going up and you just can’t afford to do that,” Soheili pointed out.
As far as how to accomplish this, putting Moore’s Law aside for a moment — which translates into cost, and the scaling of the previous node to the new node, and how that translates into performance/area/power management — ignoring that for a second, he said, “all of our customers are asking, ‘How can I keep my power budget the same, and what kind of an increase at that limit can I get in integration or performance?’”
Stacking the die
While not ideal for every design, one possibility may be to leverage 2.5D, Soheili said. “2.5D allows a large ASIC that would otherwise have forced you to go to the next node to be broken into two of the existing nodes. If it stayed in an existing node, it might be a reticle size or even larger than reticle size, so it’s either impractical or very, very expensive when you apply today’s D0s (defect densities) to a very large die size or reticle size.”
“You break that into pieces — maybe two pieces or maybe more — depending on how naturally, or unnaturally you can break that chip up into two or three dies; and then assemble them through some sort of a smart interconnect so you don’t lose power or performance, and it minimizes the impact, economically, and/or leverages your yield curve on a smaller die against a large die. And it benefits you economically,” he explained.
In one recent example, a finer geometry chip was taken back one node, divided into two chiplets, which were then connected on top of an interposer inside of a package. The cost was 20% less than what it would have been at the more advanced node.
In another case, certain functions from an ASIC were pulled out that either would have grown the die too big for economic reasons, and would have pushed it to the next generation of processes. Chiplets were built from them, then connected with the ASIC.
It does beg the question, though, how many projects could have maximized profit by going a different way?
“Like everything else in our industry, one size does not fit all,” Soheili said. “FPGA companies will need to go to the next node no matter what, because the processing elements all connected together is basically a big wiring chip with processing elements underneath it. So they’re going to have to keep going. You’ve got the GPU guys that have to keep rendering at higher and higher resolution and speed. They have to keep going. There are application processors that have to go up in speed, and do more things for consumer applications. They’re going to have to keep pushing the envelope.”
Outside of that, questions remain. In automotive, for example, will advanced nodes be required? He feels that much depends on the specific application. “If you are thinking about a Pascal (Nvidia’s most advanced GPU) inside of a car, such as an autonomous vehicle where you have to do real time artificial intelligence and deep learning, chances are you might because there are a lot of things going on when you’re driving down the street.”
As to whether that warrants being at 10nm or 7nm, Soheili suspects it will eventually. But much depends on the environment and what is needed in the car as far as the sensors go.
From a complexity perspective, he said eSilicon has completed at least one deep learning chip. “It pushed every single limit. We used four HBM stacks, the most sophisticated stretched and stitched interposer underneath it, a maximum reticle size for the ASIC in the middle of it — it pushed every single boundary. If the next generation was available when we designed this, we would have absolutely gone to the next generation.”
But he added that the economics are so overwhelmingly against going to the next node, you absolutely have to need it. “Developing a first-generation finFET chip could be a $30 million proposition just from our perspective. I’m probably under-calling it a little bit. And this is just from netlist on, not including the architecture, the RTL, the verification, the software pieces, the system qualification Just the piece that an ASIC supplier like us has to go through, that’s $30 million. For the next generation of chips, that number could easily be $50 million or 2X. Even the deepest pocket guy out there is going to have to think twice about spending this just to have the bragging rights—or going to go there only if absolutely necessary.”
There also are a number of techniques that ARM’s Yeric refers to as, ‘scaling boosters.
“If it isn’t possible to get where we want to go with pitches, what can we do that helps designs get higher transistor density on the wafer?” he said. “If you look at wafer cost and pitch, you could see that Moore’s Law is over, or definitely slowing down. The trick is that the work in scaling boosters is the active area of the industry, with one asterisk: EUV. EUV didn’t quite make it for 7nm in the foundry technology nodes, but it is really needed for 5nm. If we don’t get a good EUV technology at 5nm, I might have to come back and be a little less optimistic. But researchers are keeping up pretty good progress in that area, as well.”
On the scaling booster side, one of the key issues is that a lot of these methods are one-time tricks. “We have had a long history of lithographically shrinking the pitches, the Dennard scaling of the pitches and what have you,” Yeric said. “That’s having to decide on these one-time tricks, or multiple sets of one-time tricks. That’s the change management challenge that’s going to happen in the industry, and it’s not just the foundries or the equipment providers. This is a stack problem where we have to look at systems and circuits, and how they interact with the devices.”
When it comes to managing power and performance, circuit designers are well aware of the Dennard scaling, but he reminded that unfortunately, we aren’t in that regime any longer. “That period is gone. In any of the modern nodes, in fact, all things being equal, I will often be able to take a technology node — call it 7nm — make the gate pitch bigger, make the cells bigger, and end up with a smaller chip that runs faster. We are technically in the reverse-Dennard era now.”
As that pressure gets more and more intense, there will be increasing pressure to bifurcate the technology nodes: high performance/higher cost; lower cost/higher efficiency. And that pressure is only going to increase, which translates to more complexity to manage.
Not ignoring the wafer side, there are options that could help with scaling, including finding a better memory, Yeric asserted. “The SRAM scaling is a little more of a pessimistic story than that of standard cells. If we did find a material that could supplant SRAM — maybe higher density, maybe non-volatile — that would be a big story. Nothing is quite there yet, but there is a lot of activity there on the materials research front. Going forward, if you look at increasing density and interconnects, now you’re talking about blocks on a chip, and this looks really attractive on a PowerPoint foil. When you look at the needs of a big core wanting to go fast, and a small core wanting to be efficient, memory would like to be low power and low cost—then analog and RF is an obvious story. Many of the analog functions don’t want to scale down. They get worse.”
Still, on a block level, there is a lot of opportunity to optimize power, performance and cost with the right technology, and the right ecosystem improvements. And in the near term, Yeric insists that in the area of chip-level research, machines that can bond wafer to wafer at the accuracy that would allow standard cell partitioning are coming.
Yeric sees a lot of really good device and materials research happening by people who don’t really understand what designers need. In that kind of situation, economics always will win in the long term.
Stepping Back From Scaling
New architectures, business models and packaging are driving big changes in chip design and manufacturing.
What Transistors Will Look Like At 5nm
As finFETs run out of steam after 7nm, what comes next? The debate is just beginning.
Will 5nm Happen?
Investments in finFET technology are hard to discard, but technical and financial challenges for getting there are huge.