Changing Direction In Chip Design

This year’s Kaufman Award winner digs into scaling issues, industry fragmentation, and what semiconductor designs will look like over the next decade.

popularity

Andrzej Strojwas, chief technologist at PDF Solutions and professor of electrical and computer engineering at Carnegie Mellon University—and the winner of this year’s Phil Kaufman Award for distinguished contributions to EDA—sat down with Semiconductor Engineering to talk about device scaling, why the semiconductor industry will begin to fragment around new architectures and packaging, and where are the holes that need to be filled.

SE: What do you see as the most significant changes in design?

Strojwas: The biggest change right now is that it is very difficult to do the new technologies. EUV is more real than it was, but it’s still not there. And the nature of scaling has changed. It used to be TN-55922_strojwas_andrzej_prpurely geometrical scaling. Now it’s much more electrical scaling. From a process standpoint you really have to make sure you’re performing close to what you promised your designers. It’s very challenging and getting progressively harder. And because of the 3D nature of devices and embedded defects, it’s extremely difficult to see and fix the problems.

SE: So you can still model everything, but it’s harder?

Strojwas: It’s very important to do some early modeling—maybe the devices and the optical effects—but beyond that it’s virtually impossible to model processes and equipment. You really have to do silicon verification. If you look at what it means to verify things in silicon, you have eight or nine critical masks, and then there are a bunch of masks that are less critical. So it takes four to six months to get your silicon data back from the full process. The learning cycle is horrendous. As a result, it’s tougher and tougher to ramp up those new processes. And you have to ramp them up at a certain pace. That means there is no way to resolve all the issues related to the technology or design process. You have to do them as you’re bringing new products to the technology.

SE: What’s the solution? Is the difference advanced packaging or something else?

Strojwas: In the main SoC type of approach, you have to react to things that are true yield killers, or which are causing huge performance distribution spreads. But you can’t afford to wait months, so let’s partition the problem and do the short-flow vehicles early, when you do the development of the layout design rules where you are guaranteeing your devices are performing good to spec.

SE: You are talking divide and conquer here, correct?

Strojwas: Yes.

SE: Does that work in a complex SoC, particularly as you start having more interdependencies and physical effects?

Strojwas: It works to a certain limit. Whenever you can isolate things to front-end problems or just back-end problems, you can gain a lot of information. Eventually, you’re right, there are a lot of interdependencies and you’ll have to address those interactions. Then, the problem is how early in the process you can actually see these issues and how quickly you can react to them. The challenging designs are going to require that you stick pretty closely to what you promised to the designers in the PDK. That’s not happening. As a matter of fact, you may have the main population doing okay, but you have the outliers that are killing you, either from a speed viewpoint or from an excessive leakage viewpoint. So how are you going to find out what is the culprit? We actually have a pretty creative solution to it. Let’s intercept the product or the maybe the minute version of it on the shuttle or MPW (multi-project wafer), and let’s do the custom mask where you are going to do the characterization of the devices that are representative of the product in the real environment—the way they are laid out. Basically, you need a couple of masks. You have to modify the contact and the metal to be able to access these devices. In an SoC flow, that’s going to be about one-third of the overall cycle. You can characterize tens of thousands of those devices very quickly. In a matter of a few hours, you have the custom-made massively parallel tester, which basically gives you the full distribution of the SPICE parameters and looks at differences between the silicon and model that was used, and then identify if there are problems with the technology.

SE: Do you find that people are doing that with the critical path or non-critical paths—or both?

Strojwas: Everything. The critical path is for speed. You are looking for the devices that are not performing, but there could be a bunch of pieces of the critical path that are leaking and causing physical problems. We look at the layout. We have layout analysis tools. We analyze the full product layout and pick up those tens of thousands of devices that are going to represent the full population of the unique devices and neighborhoods. Very quickly we are told what the problem is. So is there a more generic problem or is there a problem with the particular layout of standard cells that you’re putting together? If you do it on the final product, that’s an expensive re-spin. If you do it early on the shuttles with MPW wafers, you have a chance to fix those before you go to the product.

SE: Is it power or performance problems that you’re finding?

Strojwas: Both, but it does depend on the type of product. Is this a server chip or is it a cell phone?

SE: How does advanced packaging impact this—does it help or make it worse?

Strojwas: Advanced packaging is going to help in terms of alleviating some of the problems that are related to delays or excessive power that’s dissipated in the chips. But if you’re designing for particular performance or leakage goals, you still want to make sure the technology is giving you something close to what was in your PDK.

SE: A lot of people don’t realize that after 16/14, which is the first version of finFETs, leakage starts going up again. We’re back to square one. Will horizontal and vertical nanowires help?

Strojwas: We’ll probably go through the lateral nanowires first. Vertical provides a significant area shrink, but it brings new problems. You have to do interconnects in two layers, so it’s going to be challenging. But bulk finFET has problems, too. It’s not truly fully depleted. It’s not true that you have no dopants because you actually have to do the junction isolated fins. Moreover, the cheaper solution to multi-Vt’s was to help yourself with ion implementation, so you still have some RDF effects (random dopant fluctuation).

SE: What’s next in materials. Is it GaN or silicon carbide?

Strojwas: Clearly GaN has its role in terms of moderate rate of voltages, terahertz-type of operations. Silicon carbide is going to be the extreme range of both voltages and temperatures. But they’re not the candidates to replace our basic silicon technology.

SE: So what comes next? Where are you putting your bets?

Strojwas: The next two technologies are going to be lateral nanowires and then vertical FETs. I’m not a strong believer in tunnel FETs because the variability is going to kill it. You really rely on very fine band-to-band structure, and I don’t know how you are going to control it in manufacturing. The vertical FETs or vertical nanowires will take us to 5nm. Beyond that, for our mainstream products, I don’t think it’s going to be GaN or silicon carbide. We have to look at new things. I’m skeptical about carbon nanotubes.

SE: Do you think we are going to get beyond 5nm or will it be too pricey?

Strojwas: It will be awfully expensive. There’s enough ego in the industry that somebody will try to make it happen. Is it going to be economical? I have my doubts.

SE: It’s the same with EUV, right? It’s possible to shoot a drop of molten tin with a laser powered by plasma, but uptime is variable, parabolic mirrors need to be replaced, and it’s not cheap.

Strojwas: I’ve spent quite a bit of my career in modeling litho and overlay. People didn’t realize that a lot of metals have to be so perfect because even with perfect optics you are going to get flare. As a result, it is a really tough problem. But that’s not the number one issue right for EUV. The power supply is making good progress, but with volume manufacturing you need the pellicle. You need to inspect the mask blanks actinically. Progress is made, but there are some ambitious statements being made. Some of them were taken back recently. We’ll do 7nm with EUV.

SE: Then you’re down to double/quad patterning with EUV and suddenly this gets even slower and uptime has to be perfect all the way across.

Strojwas: Absolutely. Forget about the single-exposure dream. What you need now is crazy overlay alignment specs. This is where we’re really breaking the laws of physics. The other thing is you’re really working with a small number of photons. The stochastic effects, the shooting noise, are going to be brutal. Suddenly we’ll have devices where you claim that RDF (random dopant fluctuation) doesn’t exist. We have line-edge roughness to deal with. Things are not going to be pretty. The other aspect of pellicle blank inspection is viability for volume manufacturing. And the uptime, or even the peak power versus effective power, are two different things.

SE: Let’s back up a few notches. We used to have one problem at each node. As you’ve alluded to, we now have many problems at each node—and they’re all major. Where do we go next?

Strojwas: Some really huge changes need to happen. Products like the mobile platforms are going to become more powerful, and they are still going to push for miniaturization. But it’s a myth that we are reducing the cost per transistor. From 20nm on, that is not true. Many folks are showing that the cost per function is actually increasing. You can put in lots of standard cells, but then you have to access the pins. That’s really the limiter. The cost per function is really what matters. The interconnect is another thing. Having 15 layers of metal doesn’t help. The congestion is at the low level, and this is really where you’re very limited.

SE: So how does this play out?

Strojwas: Mobile will get smaller and smaller. And people have ambitious plans for servers. But how long are we going to deal with the servers that are sitting in data centers and basically doing a bunch of searches using the current architecture? That’s going to be a big shift. People are talking about in-memory computing. You really need to change the architecture of the processor. If you look at all this neuromorphic stuff, this is for real. This is all relying on very ambitious 3D that has to be put in place. That’s not trivial. The main barrier for the true 3D is the cost of TSVs. The other issue is heat, which is a more universal issue. Memory on top of logic works to some extent. How exotic will the solution need to be for the complexity of server? Do you put micro-fluidic channels in? That adds cost. That’s really the big issue. Is 3D going to alleviate the need to scale? I don’t think that’s really clear.

SE: But it does affect scaling, right? We have to think in terms of integration of a system.

Strojwas: It should. If you look at a population of servers or architectures in a data center, very little of power and delay actually involves computing. Most of it, in terms of power and delay, is memory access and distance [from the processor] to memory and storage. Clearly, this is a humongous opportunity for 3D. But it has to be affordable and reliable.

SE: In that context, we’ve been hearing a lot of discussion about 2.1D with organic interposers, and 2.5D with silicon interposers. 

Strojwas: Yes, the interposers definitely play a role, but  it’s not as aggressive scaling as you might need for a data center architecture. It would work okay for some, but there may be different roadmaps for different types of products.

SE: Full 3D makes a lot of sense on paper. But in terms of widespread deployment, it doesn’t look as if that will happen within the next 5 years.

Strojwas: No. I was hoping 2020, but right now it looks like we’ll have to push it. That’s unfortunate, and the problem is that people have to make decisions relatively early. They have to make some bets.

SE: That was the case with EUV, right?

Strojwas: Yes, but 7nm without EUV is a mess. The biggest mess is the overlay and the line edge roughness. People may say etch placement rather than just overlay alignment. These two are definitely challenges, along with the manufacturing challenges.

SE: And the interconnects, as well? Cobalt is now a potential replacement for copper.

Strojwas: Yes. Contact resistance is a big issue. In principle, finFETs with higher fins look good, but then you have to contact these devices and serious resistance is basically killing you. Then what is beyond copper?

SE: Where do you see alternatives like quantum computing or neural networking fitting in? They don’t necessarily need the latest process nodes.

Strojwas: They all look pretty promising, but not yet. Neural network is more of an architecture issue than just an implementation issue. With neural networks and neuromorphic computing, you’ll see a shift there. But there will be a lot lip service until the true 3D kicks in.

SE: So that is the big changeover for our industry?

Strojwas: Yes—it’s humongous. It could create a divergence. Right now the mobile platform servers are walking the same path. The next step is a big one and the bets are costly. EUV is a good example of how costly the bet is.

SE: Can we roll back some of the technology or does it all have to be at the latest node? If you go 3D, does it have to be 5nm or can you get the same benefits at 28nm?

Strojwas: It’s going to be application-dependent, but 28nm is going to be a very long-lasting node. There’s a lot of stuff that we need to figure out for the whole IoT/IoE, where the breakdown is, ‘How much do you have to do locally versus sending something to the cloud?’ 28nm may be a really good node, very economical. That does not change the need for the high end, though. The only thing that will change that will be true 3D, and for that we have to wait.

SE: One of the attractions for the die stacking was that you could do all of this at different nodes. The only thing we’ve seen so far is homogeneous. Are we heading toward integration of different dies created at different nodes, where a 7nm platform may include a 130nm analog chip or chiplet?

Strojwas: That is definitely doable. You see the interposers getting smarter. Even if you look at the current iPhone or phone by Samsung, you see the components are really coming from different technology nodes. There’s a potential of integrating the heterogeneous chips coming from different technologies in those different nodes, other than the application processor. The transceiver doesn’t have to be at the same node.

SE: So how do you see all of this playing out?

Strojwas: There will be divergence depending on the product. And there will be different groups of companies leading the attack in those different dimensions.

SE: Is this a splitting of the market that is here now, or it is incremental? Automotive, for example, is almost entirely new and evolving?

Strojwas: It’s going to be more drastic splitting. Right now the servers and the mobile do the same type of scaling. They have the same requirements for the foundries in terms of leading-edge nodes. That may actually change, and there will be different solutions chosen. For mobile platforms you can go quite a distance with more incremental changes in packaging. Going to heterogeneous and so on to make the next quantum leap for servers requires more drastic changes in the whole supply chain.

SE: This is no longer just von Neumann type of architecture, right?

Strojwas: No, we’ll have to abandon pure von Neumann. It doesn’t mean that the server business won’t want to do 5nm. They’ll still do 5nm, but with a different kind of chip and integration. Eventually they’ll have to bet on true 3D. They have no choice.

SE: What happens on the foundry side? Are they able to keep up with this?

Strojwas: It’s going to be difficult to fully integrate and create a new type of IDM. Plus, the foundries have ambitions as well to go beyond where they are in terms of wafers—they want to capture what used to be the OSATs’ business. The foundries are ambitious. Their business is good for the leading guys and they’ll be trying to satisfy their needs. I don’t see a huge fabless emerging that will say. ‘Forget about TSMC.’ It will have to be foundries responding to the challenges, but probably there will have to be money thrown in from the fabless or leaders to establish this new infrastructure.

SE: Metrology is way behind at this point. How big of problem is that?

Strojwas: That’s really where we are at an absolute crisis. When you look at , doing the in-line metrology is the biggest issue. You build, build and build, and you pray that it works. For those on the leading edge of CMOS, the problem is not any easier. If you find out four to five months afterwards that there’s a significant problem, then it’s really a big issue. The inline metrology is absolutely crucial. So far, there’s been a lot of optical inspection and also measuring overlay and profiles of the vias. Those are very spotty types of checks. On top of that, a number of defects are 3D in nature. With 3D finFETs, you’re building products where you have tens of billions of vias, and the problem could be contact interface types of issues. There’s no way of inspecting them optically.

Related Stories
What’s Next For Transistors
New FETs, qubits, neuromorphic approaches, and advanced packaging
Overcoming The Limits Of Scaling (Part 2)
The impact of security on architectures, what’s missing in software, and why EDA business models are so rigid.
New Architectures, Approaches To Speed Up Chips
Metrics for performance are changing at 10nm and 7nm. Speed still matters, but one size doesn’t fit all.



1 comments

realjjj says:

Mobile means glasses, photorealism at 16k 120FPS ore more in a very limiting form factor. Today you can’t do that even at 10k times the power.Server is far more relaxed.

Leave a Reply


(Note: This name will be displayed publicly)