Pushing Performance Limits

Are optimizations really a zero sum gain? Where to gamble and when to play it safe.


Trying to squeeze the last bit of performance out of a chip sounds like a good idea, but it increases risk and cost, extends development time, reduced yield, and it may even limit the environments in which the chip can operate.

And yet, given the amount of margin added at every step of the development process, it seems obvious that plenty of improvements could be made. “Every design can be optimized given enough time, but time is a luxury,” says Kirvy Teo, vice president of business development at Plunify. “Tradeoffs are by definition a zero-sum game. Overcoming this paradigm is the challenge.”

What appears straightforward in theory is far less so in practice. “Design teams should always target realistic performance metrics and define them in the early phases of the design,” advises Sureshbabu Kosuru, director of silicon engineering at Open-Silicon. “Cutting corners to meet tight schedules ups the risk of encountering performance issues on the silicon later on. Design teams should develop and practice best methods by adhering to foundry and IP vendor recommendations for building margins into the chip, and include these overheads in the project schedules up front.”

The key here is the desired level of performance has to be part of the development process. “Pushing performance limits at the implementation level is going to be highly limited and highly risky,” says Neil Hand, director of marketing for the Design Verification Technology Division of Mentor, a Siemens Business. “In addition, if performance does not take aging into consideration, then much like Icarus, you are flying too close to the sun.”

Fig 1: Product tradeoffs. Source: Copenhagen Business School.

Part of the problem is that newer technologies may have issues that are not fully understood or characterized by the designers, the developers of the process technology or the EDA industry creating the tools. “Design teams should be aware of the latest technology shifts and their ill effects,” says Kosuru. “They may require extra analysis or improved methodologies in order to counter the changes. Foundries should regularly update their customers about the process maturity and margins so that design-aggressive margins don’t kill the designs by prolonging the cycles.”

There are many sources of margin. Joao Geada, chief technologist for the Semiconductor Business Unit of ANSYS, has a running list:

  • Margin to compensate for effects that happen from outside a particular view of the design. This could include PLL jitter and package/board RLC resonances, for example.
  • Margin to allow the process as manufactured to be different than the process modeled. “Designs take time to create and the foundry process is constantly evolving. It is typically too expensive to adopt the very latest foundry models late in the design cycle, so margin is added to give room for some differences between what was modeled versus what will be manufactured. Basically you’re trading off some PPA for more predictable manufacturability and yield.”
  • Padding in conventional power/ground (pg) grid design. “The traditional approach assumes pg needs to be evenly distributed without taking into account that power needs depend on local properties of the design. For example, some areas are more power-hungry than others. But traditional flows have pg and timing decoupled, so neither can take advantage or influence the other. In traditional flows, the entire pg grid is designed to a specific max Vdd drop target, and timing has to assume that drop is present everywhere.”
  • Margin to avoid previous ‘escapes’. “Most large companies have made bad silicon and debugged the cause. In some cases the escape cannot be fixed with the existing solutions and so pessimism is added to the flow to avoid future escapes. As tools get better, some of these margins become historical padding and no are longer justifiable.”
  • Padding to account for inter-dependencies of effects. “Most existing solutions address single physics problems and do not deal with interactions between effects.”

Geada also provides examples of some of the interactions that are often not considered, including:

  • Constraints that depend on waveform effects. “Static Timing Analysis (STA) does not model those effects. Existing standard cell design characterizes constraints without knowledge of any design effects. (at low voltages, this effect can be as large as +-50%.)”
  • Variance. This affects both slews and delays and slews variance affect delays and delay variance, but STA does not model those correlations.
  • IR drop and timing interactions. These depend on each other, but conventional flows model them as non-interacting effects.
  • Temperature variation. This isn’t a uniform value within a large SoC, but that is how it is modeled in conventional corner-based solutions.

An important first step is to understand the market and environments in which the design is meant to operate. “It is important for an STA engineer to understand the chip’s functionality and the environmental conditions under which the design needs to operate,” says Open-Silicon’s Kosuru. “For example, if the design is not meant to work below 0°C, it makes no sense to close the design for the standard -40°C temperature. In lower technology nodes (28nm and below), this would mean that we can improve the performance of the design by at least 7% to 10% due to the temperature inversion phenomenon. The tradeoff here is that the IP vendors, including the standard-cell IP vendors, are willing to provide the libraries characterized for our desired temperature only for a substantial investment. This often becomes detrimental to design closure.”

Separation of concerns
Many tools within an EDA flow evolved as particular problems were encountered and solutions found. This led to tools often treating each problem separately. The same has been true within the design teams as well. “Following a traditional silo based design approach, chip, package, board and system designers use pre-determined margins to design their specific component,” says Youngsoo Lee, senior product manager for CPS Solutions within ANSYS. “These designs are done by separate teams or even separate companies with very little communication. In addition, existing tools limit each design team to analyze and resolve single physics (timing, power, temperature, etc.) at a time without any visibility into interactions between multiple physics. It is no longer sufficient to design and analyze each component separately, nor is it acceptable to consider only single physics at a time.”

Many tools also simplify the process or make assumptions that may be pessimistic. “Traditional approaches of uniformly over-designing the power grid, which worked well in older process technologies, is not going to work at advanced technology nodes due to severe routing constraints,” says Annapoorna Krishnaswamy, product marketing manager for the Semiconductor business unit at ANSYS. “This potentially can lead to timing convergence issues down the road. For advanced finFET technology processes, node count of the power grid is very high, and any reduction in node count will result in accuracy loss. With very small design margins, power signoff solutions cannot afford to be inaccurate as it can result in product failure. It is important to analyze the entire power grid flat and not resort to partitioning the design with a ‘divide-and-conquer’ approach, as it will lead to inaccuracies. The analysis will completely miss the full chip context that the power grid spans.”

Changing models
Over time, the models used by the industry have changed. Consider that in the 1980s, all of the significant delays were associated with the gates. The wires were free. Today, it is almost the reverse. Then wires were modeled as RC, and today we are finding this may not be good enough. Companies with the best-selling tools for an old paradigm are reluctant to change, and users are equally reluctant to change until something breaks.

The importance of interconnects cannot be ignored. “Global interconnect with all the buffer repeaters inserted often consumes 60% of the total chip power,” points out Magdy Abadir, vice of corporate marketing for Helic. “Major interconnect networks, like power and clock distribution networks and wide buses, are a source of failure mechanisms, including jitter, electromigration, droops in power distribution, and coupling noise. Hence, both performance and risk aversion of a cutting-edge chip are very strong functions of the interconnect modeling and design.”

Inductance is the ugly child when it comes to chip design. “Historically, interconnect has been overwhelmingly modeled as an RC network. Magnetic effects (inductance and inductive coupling) were largely ignored or suppressed so that existing tools can be used,” explains Yehea Ismail, director of the Nanoelectronics and Devices Center at The American University of Cairo. “Ignoring magnetic effects is mainly an ease-of-mind and time-to-market decision, and is usually justified by extreme margins and design methodologies that suppress inductive effects. However, this design methodology is becoming very hard to justify, or even sustain, with much higher frequencies on the horizon and increasingly complex SoCs.”

A logical question is how much does ignoring inductive effects cost in terms of power and delay? “We have seen designs that pay a huge power and performance penalty just to fit the existing RC-based tools,” says Abadir. “For example, look at the practice of using differential switching on buses and clocking with wires carrying oppositely switching signals routed exactly adjacent to each other. This methodology effectively reduces inductive coupling in terms of range and magnitude. This is because opposite currents close to each other produce opposite magnetic fields that cancel each other out. However, oppositely switching wires adjacent to each other consume four times the power on the coupling capacitance between them as compared to a single wire switching or the average switching case for two active coupled wires. In addition, the delay is double that of a single wire switching case.”

Ismail puts this in stark terms. “Inductance is actually a useful element and resistance is harmful to performance. Inductance is a reactive element that does not consume power in itself while resistance is an active element that consumes power. These two elements always appear in series in interconnect networks, and suppressing inductance will always result in more losses because of boosting resistive effects.”

New approaches
Many performance limitations can be overcome by relaxing other constraints. “We are talking about jitter for a given power – it is always a tradeoff,” points out Muhammad Faisal, CEO for Movellus. “If you want better jitter, then you burn more power. Customers want to optimize their SoCs and are willing to give up a little jitter if it provides them with freedom somewhere else, potentially in terms of power. In an SoC you may have a PLL in the corner, and you route the high-frequencies across the chip and as a result you end up accumulating a lot of jitter in the clock tree. All of the noise related to switching is added to the clock. If you could synthesize PLLs, you could put them right next to the block that will use the clock and you eliminate the jitter budget.”

The more you know the better a design can react. “One way to push the limits of performance is by minimizing risk through the embedded monitoring of the dynamic variables that affect actual device performance such as process, voltage and temperature (PVT),” says Stephen Crosher, CEO for Moortec. “With the ability to monitor the parameters that influence chip performance and the degradation of circuitry under thermal and voltage stress, IC and SoC designers can build much more efficient, high-performance and cost-effective products.”

Crosher explains how this can improve performance. “The monitors provide updates to an in-chip controller, which facilitates the recording and interpretation of the results. Many applications are possible using this two-stage architecture, from a one-off profiling of in-chip parameters at product test through to real-time active management of processor cores and memory to avoid localized aging effects and to maximize performance at a given voltage and temperature point.”

The tools can change, as well. “One solution is machine learning,” says Plunify’s Teo. “Using machine learning is actually akin to using brute force where you need to process loads of data. However, the key difference for machine learning is that you can train the model earlier to improve its accuracy, save it, and use it later when you need it. The fundamental axes are beginning to evolve from ‘runtime versus QoR’ to ‘data versus QoR.’ The more data you can accumulate and analyze, the better the prediction models. Tools that potentially can anticipate problems or predict performance will save precious iterations on poor design choices. Eventually, this will be the difference between a superior design and mediocre one.”

New circuit techniques also can change some long-held beliefs. “Digital designers have always designed their block with fixed frequency and voltage because frequency comes from the PLL designer and analog is considered black magic,” says Faisal. “You have to assume that frequency, and then you do timing closure with that frequency. When you can implement the PLL in digital using a digital methodology, you can start co-optimizing the factors that constrain the digital block, namely frequency and margins, and they can be co-optimized together. That opens up additional design space which enables more optimization.”

Sometimes the hardest lessons are the ones we are reluctant to hear. “As an industry we will always be pushing close to margins – that is the human condition,” says Mentor’s Hand. “However, that is not where the large gains are to be made. It is, as they say, ‘fiddling in the margins.’ True performance is made and validated at the system level.”

Hand says that “system-level performance analysis, together with a unified verification and validation platform that spans simulation, emulation, and prototyping, allows performance tradeoffs to be considered early in the design process, and then constantly refined as more of the design implementation is realized in both software and hardware.”

The one thing they all have in common is that the more knowledge you have, the better the decisions that are likely to be made. When information is purposely ignored, the implication of that must be clear. Otherwise, you had better be adding enough margin to compensate for it.

Related Stories
Performance Increasingly Tied To I/O
Chipmakers look beyond processor speeds as rate of performance improvements slow.
7/5nm Timing Closure Intensifies
The issues may be familiar, but they’re more difficult to solve and can affect everything from performance to yield.
Pushing DRAM’s Limits
Plumbing problems of the past continue to haunt chipmakers as gap grows between processor and memory speed.
Trimming Waste In Chips
How much extra circuitry is necessary is a matter of debate, but almost everyone agrees it can be reduced.

Leave a Reply