Smaller Nodes, Much Bigger Problems

Ansys’ chief technologist digs into looming issues with device scaling, advanced packaging and AI everywhere.

popularity

João Geada, chief technologist at Ansys, sat down with Semiconductor Engineering to talk about device scaling, advanced packaging, increasing complexity and the growing role of AI. What follows are excerpts of that conversation.

SE: We’ve been pushing along Moore’s Law for roughly a half-century. What sorts of problems are you seeing now that you didn’t see a couple nodes ago?

Geada: The first transistor was in the late 1960s, and the first integrated circuits from Intel were in 1970. So yes, we’ve been doubling every 18 months for a really long time. The easy games are over, and now we’re coming up with tricks to push things further. We haven’t run out of tricks yet, and hopefully we won’t by the time I retire, but progress is going to come from different directions. Most foundries seem to have some sort of plan to get to 3nm, give or take a little bit. The numbers don’t actually mean anything these days. They’re just a proxy for density they’re able to achieve. But they’re all talking about the same effective transistor density on chip.

SE: As we head toward those nodes, are you seeing huge numbers of the same kind of processing elements, or are you seeing more heterogeneity in these advanced-node chips?

Geada: It depends on the domain. But in general, heterogeneity is coming back in multiple dimensions. As you go to smaller geometries it gets more and more expensive. One of the things that was driving Moore’s Law is that the cost per transistor was dropping. It hasn’t been dropping noticeably recently, and in some cases it’s going flat. So yes, you can get more transistors, but the cost per die is going up significantly, so those two things balance out. We are going to see more heterogeneity because some things you don’t need to develop at smaller nodes. Sometimes you don’t even necessarily want to be in silicon. For example, if you want to deal with extremely high-frequency parts, maybe the right answer is some gallium arsenide variants or even silicon germanium. For ultra-high-frequency stuff, such as millimeter wave 5G, silicon can’t get to those frequencies. When you talk about systems, which is what people really care about, heterogeneity is being forced. Yes, certain things are still going to be a homogeneous cores, such as a GPU, but systems are going to be heterogeneous. There’s going to be specialized silicon or other fabrication processes for dealing with specialized needs, whether it’s to do with very high frequency radio waves with silicon germanium or gallium arsenide, or whether it’s specific processes to deal with photonics. Sure, you can integrate some of it with normal conventional CMOS, but it’s easier when it isn’t. And you don’t need the very high cost of an extreme node when the features you put on it are to manipulate light, which is actually much larger than the smallest transistors you make.

SE: So what goes on 3nm? Does that become more homogeneous while the rest of the package becomes more heterogeneous?

Geada: Yes. Basically, each subsystem specializes to the things that it does well. So 3nm is great for compute density and relatively low leakage power. It’s very well behaved. You can put in a lot of localized compute power using a lot of transistors. But it’s not good for everything. And so the things that it’s not good for migrate out from non-optimal silicon to whatever best approach suits the design. We’ve seen this in multiple dimensions, ad this is one of the reasons why all the foundries and manufacturing houses are pushing so heavily on all the 3D-IC flavors and FPGA.
They’re coming up with systems that are no longer just a single die. They’re stacked up for particular purposes. There’s a whole ecosystem now around high-bandwidth memory, which is being adapted because you no longer can afford to go off your package to talk to memory. It’s too far. The speed of light distances have always been a concern. Performance, in many respects, is limited by how large it is physically. It used to be system-level interfaces, where you went off-package to talk to your memory, or you went off-package to talk to your laser inputs for high bandwidth, high-performance serial interfaces. Or you went off-package to talk to your antennas. That it is becoming impossible with the power and performance budgets that people need these days. Whenever possible, all that functionality is going to migrate inward, as close to the processor as possible. So you’re starting to see much more complicated packages, where the antenna, the lasers and optical communications, the SerDes and the memory have migrated the package itself through some sort of 3D-IC technique.

SE: So basically we’ve shrunk the board?

Geada: Yes, and the next big integration/migration is going to be where the PCB becomes silicon. The thing that is easy to skim over is that when you double the number of transistors every 18 months, you’re more than doubling system complexity every 18 months. So your entire tool set, your entire reasoning and methodologies, have to perpetually be able to scale to deal with a doubling of demand every 18 months. Everything gets extraordinarily more complex. And this is one of the reasons we have invested so heavily in scalable platforms. You can’t just stay still with an architecture that is we built from the ground up for scalability.

SE: Does it make it simpler on the digital side if we head toward a chiplet approach?

Geada: The short answer is that I haven’t seen it yet. In theory, once you have tested components that you can actually use, it will get simpler. We’re not in that world yet. And in practice, just cramming more transistors into a smaller physical area has brought along the law of unintended consequences. It used to be that the predominant cause of a drop in voltage on any cell instance at any time largely had only two components. One of them is the drop caused by ‘this’ instance switching — having some activity. And because there’s some finite resistance over to the power supply, the instance switching causes a localized voltage drop purely from switching effects. And then there is a simultaneous demand of all of these guys, which lowers the ability of the power supply to actually supply power to everybody. So there’s a system-level ripple and a local drop. And largely, that perspective has been good enough for a decade or so. Most people’s voltage analysis is based entirely on that concept. As soon as we migrated to finFETs, that story stopped making technical sense because those two components started to be drowned out by a third component, which is the effect of other parts of the system on local-area switching. And suddenly the problem became very large and very hard to analyze.


Fig. 1: Calculating thermally aware electromigration from wire temperature. Source: Ansys

SE: We also started getting things like wires that were too thin to move current very easily. We’ve got capacitance issues that we’ve never had to handle on this scale before, right?

Geada: And inductance, which is starting to get interesting. You could ignore inductance at lower frequencies, but at newer nodes, with higher-edge frequencies and more complex metal, inductance starts to be something that you need to pay a lot more attention to. It’s not just on things for things like resonators and antennas, either. Suddenly your clock, which by design has very fast edges, starts acting like an EM emitter and couples to any nearby wire that has the correct orientation. Inductance is not an effect that can be ignored on-chip anymore. It is critical for any other high-speed lines or any device with high frequency components. So any signal that switches very fast has high frequency components that you need to analyze and make sure they are not affecting or being affected by resonances from elsewhere in your design.

SE: Does that spill out across the entire package on a multi-chip package?

Geada: Potentially it does. The same principles apply to thermal, as well. When you are doing a planar chip and that is your your entire universe, your thermal gradient is predictable from your own activity. Now you’re stacking up other stuff around you, and you need to take into account that above you might be a memory, which typically runs cool, or it might be a SerDes, which typically runs hot. Or somebody may have decided to put some AI chiplets on top, and those are largely unpredictable. It depends on how they’ve been trained and what inferencing they’re doing.

SE: There’s been a big push to move the memory closer to the processing, and potentially move the processing inside of memory. Memory is very sensitive to heat. What impact is is that having on the design?

Geada: You end up having to simulate all the physics at once across the entire system, and you need techniques with enough performance to analyze sufficient modes of operation and interactions to determine whether the system will actually work as expected. There is definitely a thermal effect, and you have to analyze that thermal effect. But it’s not just from the heat you generate. When you are applying that inside an ADAS environment, for example, you have a 200 horsepower or more heat generator very close to you. That’s going to add thermal stress. Alternatively, you might be on the back of a wind turbine somewhere where it’s much more gentle thermal environment in many regards. There’s still a thermal gradient in the design, though, and it needs to operate in very different thermal regimes.

SE: And that affects how it functions, how it ages, how the signals are moved around, and whether they are compatible with the filters, right?

Geada: Yes, and you always have to analyze as much of the entire system at once for as many scenarios as possible. Ansys has invested a lot not just in large scalable platforms, but also how to automatically construct abstractions of one level of behavior to feed a different level of behavior. We can model entire systems, whether it’s an entire car or entire windmill, from the full physical functional system all the way down to individual chips. Information is flowing up and down the stack and automatically creating abstractions. So when you’re simulating the behavior around the sensors in the car, there’s still some physical awareness of what the chip itself is doing and what the thermal environment is going to be while receiving an image. You still need to detect a pedestrian crossing, regardless of whether you’re in the heat of the desert or in Greenland.

SE: This is incredibly complicated and there are a lot of tradeoffs. Where are people going wrong with designs?

Geada: We’ve seen two main directions. The first one is very common, which is assuming this new generation is the same as the old generation — and that methodologies and techniques you could apply to design systems in the previous era are still good enough to design these new systems where there is higher integration or more heterogeneity, and more complex environments. It is a more challenging problem. There are problems that come up that weren’t questions in previous methodologies, like persistent thermal stresses and thermal gradients. You may have a stack where one part of it is running at 65nm and other parts are running at 28nm. And all of this is one system. Most traditional methodology don’t do well with these complexities, and people frequently forget to take into account all of these complex interactions. And because of these interactions, where people make a mistake is to get so caught up in the details that they don’t make any progress.

SE: One of the hidden costs is retraining your staff, right? While they can learn this stuff, it’s extra time out of a design cycle. So now, instead of just doing a chip, you’re also doing a chip and extending that time that it takes for these people to learn this.

Geada: For companies that really value their employees, they’re probably okay. But there is major competition for the talent that can do this. Companies that don’t take care of their workers will see them walking out. There’s a reason why certain companies do this a lot better than others, and it’s largely traced to being able to acquire the more talented, motivated workforce, with enough flexibility to learn and adapt to this changing circumstance. There’s a perpetual pendulum cycle where you go from integration, to disintegration and specialization, and back to integration. We’re back on the re-integration phase.

SE: How do you see all of these different components playing together? There are a lot of startup technologies out there, such as RISC-V, which are challenging this status quo

Geada: RISC-V is interesting, but it doesn’t have the reach and development and history behind it that Arm has. There are not that many choices. If you’re looking at the LiDAR, there’s not that many choices. If you’re looking at radar, it’s the same. Most large companies are actually specialized within a domain.

SE: So if you have more specialized players, but fewer of them, does that help in terms of the integration of all these different pieces?

Geada: We don’t know yet. But it does give them a lot more leverage.

SE: If something is already fully characterized to the point where you know how it’s going to behave, does that give them an advantage over starting from scratch with a more customized solution?

Geada: It’s convenient and not convenient in two directions. It tends to be more predictable when you’re dealing with one of these large players. They have their own trends and trajectories, and once you figure it out what exactly drives them, it’s easier to stay on top of their needs and their their actions. The catch is that because they’re all hyper-focused on their own domain, you can’t necessarily pick the lessons you’ve taken from one of them and apply those to a different domain because they’re focused on different markets with different requirements. So you can take the knowledge of a 5G domain and translate it into something that works for radar. The physics are the same, the basic capabilities of the tools are the same, but the environments in which its applied and used are different enough that you end up with the large-scale customizations. EDA has always been custom off-the-shelf. It has become more so recently. Every customer is slightly different from every other customer.

SE: AI is customization on customization, because these systems optimize over time. How do we actually figure out what happened with the algorithms because we can’t read into them once they’ve been trained – particularly with machines training machines — and we’re patching these things, as well.

Geada: This is one of the things that EDA, as a rule, has done pretty well. Most of the engineering algorithms in EDA are designed to work within provable somewhat pessimistic bounds. It’s not that the system is ideal for this one particular execution factor. Most of our proofs and simulations are geared to all possible use cases. Given the constraints, we can prove the system will operate within the power, temperature, electrical budget you gave us. Working out our algorithms are, as far as possible, narrowing down pessimistic bounds to as close to reality as we can for realities can for the nominal case.

SE: How do you see AI filtering into all of this, and how successful do you think it will ultimately be?

Geada: AI is an is a very large umbrella for a whole bunch of unrelated techniques and domains. It covers a wide spectrum of functionality. You have the part where the system is learning. That requires a level of complexity and analysis that is completely different from the inferencing engine, which is just applying the inputs and the weights and propagating forward and not doing anything else. It’s repeating the same operations time and time again. It has been given a learned set of weights, and it has a much simpler behavior than the machine where you needed to do the large-scale learning. Then there are domains like edge AI. How does my phone know when I say, “Siri,” to wake up? That’s done locally. Object recognition is done locally. Edge AI is incredibly important because even with 5G we’re going to have limited bandwidth and latency to get to the cloud to do deeper analysis. So a lot of the analysis has to be done at the edge, and only exceptions, flags, and statistical data can afford the migration to the cloud and back. Something that is running in an accessible location has different constraints than something that is running in my house.

SE: How does this play out in other markets?

Geada: We’re starting to see more and more stuff applied everywhere, even for things like prosthetics. There’s some really cool work where we can now read the nerve impulses at the surface of the skin, interpret intent and make a prosthetic move as if somebody had the real arm there. But they all have different domains of applicability in difference. So I expect AI is still going to be a wide domain. Typically 9 out of 10 of these ideas will fail, but there are going to be successes. And by definition, there’s going to be more than one winning architecture because there’s more than one different type of problem out there.



2 comments

t rex game says:

Usually, 9 out of 10 ideas will fail, but there will be success! Sure! This is what I want to say

Danny says:

I guess on a mere logical basis it should have come as no surprise to me to see the statement that as we approach 3nm the next step in integration/migration is the PCB itself goes to silicon. As someone who has been in and out of the computer design and supply chain since the 80’s I’ve seen first hand how things like graphics cards, modem cards, network cards, parallel interface cards, et.al. ad nauseum, have all and more been integrated into silicon. But it STILL made my mouth drop open to actually see PCB on silicon stated. Like I said, over the years I could see the trend. But I guess we are finally here. Amazing !!

Leave a Reply


(Note: This name will be displayed publicly)