Dealing with physical and electrical effects in advanced nodes and stacked die.
Experts at the Table: Semiconductor Engineering sat down to discuss power integrity challenges and best practices in designs at 7nm and below, and in 2.5D and 3D-IC packages, with Chip Stratakos, partner, physical design at Microsoft; Mohit Jain, principal engineer at Qualcomm; Thomas Quan, director at TSMC; and Murat Becer, vice president at Ansys. What follows are excerpts of that conversation, which was held at the recent Ansys IDEAS Digital Forum.
Above, (left to right) Chip Stratakos. partner, physical design, Microsoft; Thomas Quan, director OIP Marketing, TSMC; Murat Becer, VP product, semiconductor R&D, Ansys; and Mohit Jain
principal engineer, Qualcomm.
SE: Where are the biggest challenges with power integrity, particularly at the most advanced nodes?
Quan: Power integrity in advanced-node designs includes issues like how to ensure stable and reliable power delivery throughout a chip. This encompasses managing voltage drop, EMI noise, and current fluctuation. At advanced nodes — 7nm, 5nm, 3nm — the voltage drop challenges are rising with increased power density and higher frequency. Shrinking transistor sizes lead to higher current density and cause voltage drop across a chip. Each time a standard cell switches, it leads to transient voltage drop. The more cells that switch at the same time, the higher the current draw and the need to plan for voltage drop. It can result in degraded performance, timing violations, or even functional failure. The designers have to mitigate these issues through optimizing voltage regulator modules, employing advanced packaging techniques, and ensuring that power is supplied to each component.
Becer: If you look at the number of nodes per square micron that must be extracted to simulate power at high fidelity, that number continues to increase at advanced nodes. That increases the runtime and capacity challenges that need to be addressed. 2.5D and 3D-IC are adding to the complexity. On top of this, at 7nm and below, a new phenomenon has emerged, which has further increased the complexity of high coverage integrity analysis by at least a few orders of magnitude. At older nodes, the main component of the root cause of voltage drop is in-cell switching. However, at advanced nodes, the root cause shifted to the neighboring instances. In fact, about 90% of voltage drop on a given instance now in use is due to switching of these aggressor neighbor instances. The number of these aggressors may be in the thousands. This has resulted in a major coverage issue in power integrity analysis. Even for a single instance, to be able to analyze worst-case noise with high confidence you have a huge search space with thousands of potential aggressors, and with logic and timing correlations along these instances. You have hundreds of millions, if not billions of instances, in large-scale designs. This is where we’re seeing a major challenge for power integrity analysis at advanced nodes.
Stratakos: The problem is that the current densities keep increasing, and impedance reduction is not keeping pace. It’s creating a perfect storm where IR drop is becoming a bigger concern in the design process. If you go back many years, we didn’t care much about it. Timing is still king, but if you don’t deal with IR drop the right way, it can compete with timing in terms of the amount of energy it requires to satisfy it. We definitely need to bring IR drop back into an acceptable zone.
Jain: Increased density at advanced nodes is increasing resistance, which increases the challenge of mitigating IR drop. Density is also increasing due to more functionality being added in smaller areas. But there also is a constant push to operate at lower Vmin. At lower Vmin, which is critical to the threshold voltage of the design, even a slight increase of the IR drop beyond the budget can have a drastic impact on the performance of a circuit. Closing IR drop at low voltages is becoming an additional challenge, as well as at higher voltages. On top of that, our SoCs are comprised of many heterogeneous components — we are working with, GPUs, graphics, modems, multimedia — and all of these have separate cores with their own dominant performance modes. We need to close all of these separate cores for their respective power corners or modes under which they are operating. To close IR drop or power integrity for all these heterogeneous components at the same time is a huge cost, and we need to address this challenge going forward.
SE: What’s the real-world impact?
Jain: Recently we have had multiple issues reported by silicon during testing. We are seeing huge fallout, especially for ATPG and MBiST. We also are seeing some Vmin excursions during functional testing. The real challenge is the unknown vectors. In these new advanced nodes, we have so many applications running for which we are not able to pre-empt these vectors pre-silicon. That means we are not necessarily covering the full extent in our sign-off methodology. In addition, the engineering cost of reproducing and debugging these silicon problems is very high. The silicon problems are an amalgamation of various components working together. There could be process defects or defects on the chip. To understand where the problem stems from is a big engineering cost. We need to safeguard our designs from these unknown vectors or conditions so we don’t run into these post-silicon issues.
Stratakos: At advanced nodes, everything is connected. You can’t close timing, and then when you’re happy with the timing, start looking at the IR drop problem. You’re going to enter into a vicious cycle where every change will break the other thing. They need to be solved simultaneously throughout the implementation cycle. In addition, the IR drop has three distinct phases. One is the prevention phase. The second is the analysis phase. And the third is the fixing stage. All three need improvement. We definitely need to invest in prevention methods. For analysis, it’s essential that we’re able to identify and fix the right problem. Equally important, we don’t want to be fixing the wrong problem. This includes both optimism and pessimism reduction. We’re expending a lot of effort on this. We’d better be spending it in the right places. Last is the fixing. IR ECO capabilities pale in comparison to timing ECO capabilities. We need to fix that. But it also puts additional pressure on getting the prevention and the analysis right. I can’t be left with thousands of violations and then attack them with rudimentary ECO capabilities. We need to get to the point where the number of violations that need to be fixed is small enough that our weak capabilities can actually deal with it.
Becer: In every decade or so for the last 30 years, there has been major innovation in the industry for power integrity analysis. We started with static analysis in the 1990s. We introduced dynamic analysis in the 2000s. We brought massive parallelization, elasticity, cloud nativity, and data access analytics in 2010. In the last couple years, we introduced SigmaDVD. So for every instance in the design, we identify all the potential aggressors, their voltage-drop contributions to the victim instance, consider the logic and timing correlations among these thousands of potential aggressors, and compute a practically possible worst-case voltage drop noise on every instance. This is a very complex and compute-intensive operation. We have been partnering with foundries and other EDA companies to prove and showcase these improvements. It has enabled high-coverage visibility of power-grid noise with all the root causes clearly identified early in the design cycle, guiding place-and-route toward better PPA, and avoiding power integrity issues as early as possible.
Jain: The need today is for a comprehensive analysis approach that can take care of multiple scenarios that can happen post-silicon. We cannot have millions or billions of vectors that we can exercise pre-silicon and then do sign-off based on that. We really want to visualize voltage variation in the same way we visualize process variation. For process variation we have a really good way to model the local versus global variation, and we can find the bounding three-sigma impact of process variation on timing. SigmaDVD fits into that zone. We can find the bounding voltage drop on each instance in the design, and we can then safeguard our voltage drop and do timing by accounting for those local variations. The additional challenge is global noise, and we are working on that.
SE: With advanced node designs, a lot more has to be done concurrently. What’s the impact of that at the fab?
Quan: One of the key directions for us is multi-physics sign-off — thermal, EMIR, STA, stress, signal integrity, power integrity. Those things cannot be done independently. They have to be done together and optimized together.
SE: How about when you get into 3D with things like 3Dblox? How does that affect power issues?
Quan: At the die level, when we go to 3D-ICs, power integrity challenges arise with vertical stacking of multiple dies. That requires managing the power distribution, thermal effects, and also how to ensure uniform power delivery across all the stacked layers. Increased power density and proximity of components in 3D structures can lead to elevated temperatures, which can impact device reliability. So the designer has to address these by making sure they optimize the power distribution network, implementing thermal management solutions, and considering all the electrical and thermal interactions between these stacked die to maintain the power integrity. It’s a multi-physics challenge. On the 3Dblox, TSMC and five EDA partners are working on the standard, aiming to modularize and streamline 3D package solutions that are available in the semiconductor industry today. 3Dblox is a new, open industry standard to promote interoperability and unleash innovation in 3D-IC design. The group itself is comprised of 10 subcommittees. We have one for thermal and one for power integrity to address the challenges we’re talking about.
SE: Going back a few years, everyone thought advanced packaging with chiplets would be as simple as snapping LEGOs together. It hasn’t worked out that way. How do we speed up and simplify this process?
Becer: 2.5D, 3D, chiplets, and advanced packaging are among the key innovations in the semiconductor industry that are necessitated by the never-ending demand from multiple vertical industries. These systems bring advantages — high bandwidth, low latency, potentially better yield if you’re not working on a reticle-sized die, and potentially lower cost because you can mix and match older and mature technologies. But they do come with increased challenges in terms of multi-physics and multi-scale simulation. So you need to consider, power, thermal, structural, and signal integrity. And ESD in 2.5D and 3D is not the same as doing ESD for chiplets or interposers independently. You need to model and simulate these multi-physics effects at multi-scale, from transistor level to blocks to SoC to multi-die package, board, and system. It’s a multi-physics, multi-scale challenge. And there’s coupling among the chiplets and coupling among the physics. For example, power affects temperature, and temperature affects electromigration. It impacts structural integrity and reliability. Temperature affects mechanical stress, which affects transistor mobility changes, which impact signal integrity. One way to address this huge complexity is to realize the difference between the high frequency and low frequency components of power grid noise.
Jain: 3D-IC is still an evolving area. We are exploring integration from the packaging side and the die implementation side for how we want to stack up these devices in order to minimize the power integrity challenges. The real problem is how we route the power to the top-level die. The bottom die is exposed directly to the C4 bumps. To vertically connect these power lines to the top level we need to go through the interposer TSVs. We need to ensure we have good power connections and that we don’t exceed the IR drop for the top-level die. The pitch of the TSVs and the micro-bumps is important, and that’s a new dimension we have to add to what we’ve been doing for a single chiplet. Similarly, there are floor-plan challenges, especially if there are cores on the bottom die that share the same power rails as the top-level die. All of these are new aspects that we have to consider before we start implementing the 3D-IC stack. There also is interposer and microbump modeling, which is new.
SE: What’s the relationship between thermal and voltage drop, and how is that addressed?
Becer: Thermal is in the center of the multi-coupled, multi-physics challenges. So it’s not just about how thermal is impacting voltage drop. It’s how it’s impacting the overall design flow, from early feasibility studies to prototyping to sign-off, especially for 2.5D/3D-IC structures, as well as advanced gate-all-around nodes, as well as backside power.
Leave a Reply