With lithography scaling on hold and the silicon MOSFET losing its 40-odd-year grip on scalability, how can the industry continue to squeeze scaling?
Modified by Greg Yeric from original by Eric Fischer
Gordon Moore penned his famous observation in an era when the people developing the process were also the people designing the circuits. Over time, things got more complicated and work specialization set in, but all was well in the world for many years as the fabs kept delivering on Moore’s Law. Yes, designers had to come up with lots of tricks to advance power and performance scaling, but that’s Dennard’s problem, not Moore’s. As a designer in the present day, if you view Moore’s Law from that original, unilateral perspective, then Moore’s Law is indeed long gone.
Or at least on an extended vacation.
Lithography scaling is on hold and the silicon MOSFET is losing its 40-odd-year grip on scalability. I’d like to use this forum to provide a few observations along these lines, which fall into two categories: (1) squeezing existing technology harder and (2) path-finding to the best solution in an increasingly complicated set of technology scaling options.
Time to do some stairs
While area scaling from 16nm to 10nm is advertised at 1.9x, the devil’s in the details once you hop off the lithography-enabled scaling ride. And no matter what area scaling you arrive at, it’s going to be offset by a hefty increase in process cost, not to mention an increasingly ornery set of second-order effects such as variation, interconnect parasitics, and reliability. For instance, to surmount the lack of direct scaling in contacted transistors, the industry has already absorbed the costs of 5 new middle-of-line masks, and in 10nm these will need more multiple patterning steps in order to keep shrinking, plus the multiple patterning will now have to extend into the routing layers. Any time you add a multiple patterning mask, you can rest assured that you will back off from the pitch-enabled scaling entitlement (and of course pay more). What’s worse, some of the critical layers may very well need to extend beyond double patterning into triple for 10 and likely quadruple for 7. And if double patterning can make a designer cry in his beer, then quadruple patterning will make him switch to scotch.
The first thing we can do to attempt to offset the process scaling difficulties is to work harder with what we have. One angle is to dig into the second-order issues and weed out areas of over-margin. In analogy to peak oil, we have to continue to invest in our industry’s version of deep water drilling and fracking. The three issues I mentioned above are all candidates for this concept. With respect to variation, most consumer product designs are still corner-based in their approach, and this is clearly a leading contender for turnip-squeezing. It is slightly amusing to consider that the actual chip that is simulated and signed off is statistically guaranteed to never actually exist. There are potential baby steps between SS/FF and SSTA, which we can call xOCV. An encouraging development here is the Liberty TAB’s openness toward added slew/load dimensions in derate tables. We can throw into this discussion a better understanding of tracking between various VT and L device flavors.
As multi-patterning extends into the routing layers, a very similar opportunity exists with margining of wire parasitics. In the case of Litho-Etch-Litho-Etch (LELE) processing, coupling capacitance now varies as a function of mask alignment. The simple way to deal with this is to multiply by additional LELE worst case corners. But, in the real world there is not a uniform shift across the die, and thus the smell of opportunity analogous to xOCV. Consider thermal distortion effects, which are partly layout-dependent, and LDE are always a good place to start when turning over margining rocks.
Self-aligned double patterning (SADP) is one option proposed that could alleviate some of the LELE problems, but it doesn’t come for free either. For instance, if the spacer is the dielectric, you end up with much less variation in coupling capacitance. For the same reason, whoever is in charge of interconnect dielectric integrity also has got to love SADP. You also get less LER with a spacer-defined technology. However, SADP would re-introduce us to forbidden pitches, so its overall scaling will be less than desired. Second, it introduces us to extra constraints around adjacent line ends, as illustrated in the figure below.
(to be published at ISPD 2014)
As shown above, sometimes the solution to an SADP hot spot is the counter-intuitive extension of a line end. You clearly need to route some full designs in order to understand what SADP (or any new lithography technique) is going to be able to provide for you.
Directed Self Assembly (DSA) has been gaining amazing momentum and is even being mentioned in the same breath as 10nm. With my limited, um, exposure, to DSA, I see defectivity, notably some hidden below the surface, plus a restrained set of directing templates, making the DSA value proposition less than advertised pitch scaling.
From the fab perspective, a simpler and thus justifiably attractive option is to eliminate the 2D patterning headache, go with fully uni-directional metal patterns, and then cut line ends into the pattern. This is one embodiment of complimentary lithography. There are many downsides to this approach which will need to be part of an assessment of actual cost. First, topologically you end up almost always needing an extra M1 track for the gate contacts, which usurps area that was previously available for active transistors, and reducing your transistor drive is never a good idea, even if your standard cell heights remain unchanged. Depending on application, you will see various amounts of area increase with weakened transistors. Now, the fabs could figure out how to support gate contacts located over the active regions. That would radically simplify physical design and render this problem obsolete. I always bring this idea up in meetings with fab people whenever I want to hear someone laugh. Second, for everything you were able to do with a bend in M1, you now need to add a via up to M2 and over to where you want to go, and then another via down. Via resistance very recently used to be almost free, but at 10nm and beyond it is definitely not. Again, performance and/or area will pay the price. And what about the traditional cost of a via: yield? And by “a via” I really mean “boatloads of vias.” Yes, you can reduce the via count penalty by adjusting your layout style, but you will be adjusting it from something that was more dense to begin with. Also, let’s not forget that all this extra M2 isn’t free. M2 wires that occupied that space previously now need to go up or around, and in each case that will cost you. But (and here comes the broken record part) you won’t see that until you implement a test chip.
And as long as we are talking interconnect, electromigration has quickly risen from minor irritation to PPA-limiter, and the necessary trends in transistor current density, wire cross section, and parasitic capacitance makes it very clear which direction this is heading (and it’s not north). EM rules have for some time included comprehension of actual design use case (AC effects and Blech effect, e.g.) but the reality of EM rule validation is that it is a horribly bandwidth-limited process (sitting chips in ovens for weeks at a time) while it is attempting to model an inherently statistical problem (poly-crystalline materials mean variation due to random grain structure). Therefore, it is highly unlikely that statistically relevant information has been actually measured in permutation with real-world physical design use cases, which include short lines with corners and tees. In fact if you look at the EM literature, most of the data measures down to 1% failure rates and then a clearly multi-modal distribution is extrapolated down to ppb. I smell margin.
What’s after CMOS?
That link provides a nice, timely summary of the increasingly not-straightforward world of transistor scaling. The first point I make, or at least agree with, is that you need to apply rigorous benchmarks when looking at all of these potential transistor technologies. This means getting a candidate transistor into realistic (below the 10nm node) horizontal and vertical dimensions, with realistic parasitics, construct a realistic circuit out of them, and then and only then measure their relative effectiveness. Case in point: high mobility channels. They receive a ton of attention, but bulk mobility doesn’t translate directly to microprocessor performance. Take germanium, which seems to be battling it out with RRAM for total domination of the IEDM conference. Our group at ARM recently performed some detailed predictive modeling of Germanium for PFETs, and in our study, which we have submitted for publication, we found that with realistic gate lengths, gate oxides, etc., most of the mobility gain was lost. Then, adding in effects of the smaller band gap, which include increased leakage and variability, we didn’t come up with a very encouraging conclusion. Many of the same issues apply to translating large-dimension benefits of compound semiconductors to the nanometer regime. TFETs are another hot topic, but they not only need to find a lot more drive current, they need to be realistically de-rated for what will almost certainly be increased variation. A possible interesting scenario for TFETs would be if they could be integrated in a low cost manner alongside other higher performance FETs. That doesn’t seem entirely out of the question.
Maybe what goes around actually goes all around. Gate all-around nanowires would seem to be a simpler extension of FinFETs than some of the more disruptive options. But only if they’re horizontal. Vertical GAA nanowires, which would (and there’s a pun in here somewhere) turn physical design on its head, and thus would seem to face a snowball’s chance in a fab (draw your own inference), are likely to get momentum in real products over in the NAND world, and being able to leverage off of someone else’s integration battles is always a good way to climb up the feasibility ladder. While IBM recently helped graphene stock by sending a text message using a graphene-based chip, there’s a steep hill to climb to overcome graphene’s bandgap limitations and make it to low power logic devices. As an aside, when you look at the likely candidates, you see that fab marketing departments would do well to back off on all the glorious “3D is so much better than 2D” language surrounding FinFETs, because the likely path forward is to drop to real 2D, and then to 1D, in order to keep a handle on carrier confinement. Device-wise, there are fewer simple scaling options and most of the remainder are complicated propositions in terms of understanding exactly how much benefit they can realistically offer.
For all the wonder that will be the device to succeed the venerable silicon MOSFET, it’s going to be a minority voter in the performance equation, thanks to contact resistance scaling. If we want to keep our industry on the performance roadmap, we might wish to re-think the fact that 90% of our R&D budgets are going toward transistor development while the majority of the resistance in our switches are going to be somewhere else. Because contact resistance has not historically been a majority voter, we don’t really know them in the statistical sense that we should. Contact resistance isn’t normally distributed. What amount of auto-correlation should we model? There’s opportunity here in area of turnip-squeezing that I discussed above. Or, maybe there will be some success with out-of-the-box solutions, such as adding insulating layers to reduce resistance or carbon nanotubes. We are going to need them. But the best end result, if this is not getting horribly redundant (and, no, there is no longer any room for redundant contacts, those left us many technology generations ago), is going to come from holistic assessment of the interactions between the various options for device, lithography, and interconnect scaling in the full chip.
We have recently seen this principle in action regarding 10nm scaling, where transistor performance is being utilized as a proxy for actual pitch scaling. While it may be slightly suspect to bring a transistor performance knife to a pitch scaling gunfight, it is a technically accurate point in the holistic view. Performance, if delivered with power scaling, can be directly traded for area scaling (and often even more than 1-to-1). As a side effect, however, products that really need to push for maximum performance end up with a unique view of cost. As you push chip implementation toward the maximum frequency, you find yourself up into the SP&R knee, in which case you become more and more divorced from the base wafer cost — any improvement in performance ends up looking like area scaling to you. So an extra mask here, an extra mask there, can all appear to be win-win. This argument may even extend down to the base wafer cost, in the example of SOI FinFETs, because of their inherent performance benefits.
The problem is that unless you can afford your own dedicated fab line, the same process will need to provide value to less performance-hungry “below the knee” applications in which this area/cost funhouse mirror doesn’t exist. And, of course, these two types of applications sit next to each other on the same piece of SoC silicon, but in different proportions per design, leading to a plethora of opinions about the best overall solution. In this case, we might find a stronger benefit in low-cost 3DIC: The performance-hungry circuits can get to their best global cost minimum with a 172-mask process and then be married with lower-performance logic with a cost minimum enabled by a simpler process. Maybe the same node, just without some bells and whistles. Then, the right answer can be different for different designs. That’s on top of all the power and performance benefits that would appear to be achievable with 3D. While 3DIC is still in the standards-forming stage, with mature yield (cost) and EDA infrastructure this is looking like a key added weapon to the Moore’s Law arsenal.
In summary, there are many options that can help in their small way toward further Moore’s Law scaling, but there’s no way around it, the escalator is broken and we are going to climb the stairs. We are going to need to extend our margining focus and we are going to have to deal with an increasingly heterogeneous and complicated system “on chip”. Maybe that will be good for us. The incredible, decades-long, exponential progress of traditional process scaling has perhaps made the rest of the ecosystem a bit soft in the middle (not you, you look great. We both know I’m talking about software engineers). And, while 2014 marks the 49th anniversary of Moore’s original paper, this February marks the 50th anniversary of American Heart Month.
Epilogue
That crying designer (is that redundant?) finally passed out with all the scotch, by the way. Then he had a dream. Directed Self-Assembly had been harnessed to seed uniform semiconducting carbon nanotubes and place them in dense, exact alignment. And then that was repeated in monolithic layers of logic devices. Then even more CNTs were crammed into low-resistance vias and were married to multi-layer graphene wires with dramatically reduced interconnect RC, improved thermal conductivity, and no significant EM constraints. And someone had finally figured out how to place gate contacts over the active area of a device, allowing for a dramatic simplification in physical design. Above that sat a 4F2 crossbar universal memory. All on 450mm wafers with 120 WPH EUV.
Then he woke up.
With a hangover.
At the bottom of the stairs.
And lots of baggage to carry, full of customers’ expectations.
Great article. Being that I am not an engineer, it is difficult to understand. However, maybe a few re-reads and Google might help with all the acronyms.
What I did understand, I loved. It also opens the door for the common man who wants to know all the issues engineers face.
[…] Greg Yeric notes that with lithography scaling on hold and silicon MOSFETs losing their grip, the industry is […]