Problems Lurk In SoC Boundaries

Interfaces have always been a trouble spot, but recent changes in the design flow are making them even more problematic.


Interfaces always have been a problem, because only rarely does anyone have responsibility for them. Responsibilities generally are tied to functional blocks with the prevailing notion that if all blocks do the right thing, they will also behave correctly when brought together. Design teams that believe this eventually find out the fallacy of this assumption. To make matters worse, these are often the problems found later in the verification process, as more blocks are being integrated together and the tapeout deadline approaches.

In an SoC, most of the Intellectual Property blocks come from third parties, so these problems can be amplified. “Designs are configurable and IP is configurable,” points out Pippa Slayton, marketing and business development manager at Oski Technology. “Third-party IP supports different customers with different requirements, use models and performance requirements, each with different configuration inputs for designs to do various things with different configuration parameters, timing and so on. These configuration inputs multiply when combined with each other, giving rise to billions of different configurations.”

Many systems houses have attempted to implement flows that minimize interface errors, but as system complexity grows, additional issues keep raising their heads. As fast as they are discovered, understood and characterized, the EDA industry is attempting to put in place tools and flows that will allow them to be detected before they become an issue. This is one of the areas that has prompted the rapid rise of Formal Verification techniques, tools that can quickly and exhaustively look for specific problems at the interfaces.

Clock domain crossing (CDC) is one area that can now be considered to be under control, most of the time, thanks to formal verification. “Failures caused by asynchronous interactions are hard to verify by simulation,” says Pranav Ashar, chief technology officer at Real Intent. “This is because the failures are hard to reproduce and a cause-effect relationship is hard to establish. Detecting such failures gets harder with each design-refinement step and such failures that slip through the cracks into tapeout can be disastrous.”

The solutions today are far from complete. “Formal gives you somewhat better Coverage of local switching behaviors but is still unusable for anything approaching cross-SoC verification,” points out Bernard Murphy, chief technology officer for Atrenta. “Unless you black-box most of the SoC, which is just making unproven assumptions about what you can ignore, it just isn’t possible, and proofs are still often bounded (incomplete). Still, formal is better than the alternative, just woefully incomplete on coverage.”

More subtle problems can also remain hidden. “Synchronous clocks in different voltage domains can seem to be asynchronous when viewed on silicon,” notes Kurt Takara, verification technologist at Mentor Graphics. “Customers are only seeing these issues after they get silicon back.”

Another beast is rising in the form of power management and its associated circuitry. Functionality contained in a leaf-level IP block can influence or break things anywhere in the design. “Shutting down power to a domain can be problematic when you have outstanding transactions in a system,” says Drew Wingard, chief technology officer at Sonics. “This means that you must have knowledge about the global state of the system to know when a safe operating point has been reached in order to perform the requested action.”

Interfaces are no longer just functional interfaces, but have become global through the desire to control power. “By definition there will be a problem on the IP boundary if you are putting that IP under power management,” says Rick Koster, low power specialist at Mentor Graphics. “You are doing global activities on that IP that could create errors that propagate beyond the IP.”

“These types of multi-power domain SOCs are complex and present new integration challenges,” says Hem Hingarh, vice president of engineering for Synapse Design. “This is because many blocks have different operating modes at different voltages, different clock period and duty cycles of each block being awake, asleep or in shutdown mode. In particular, we need to make sure that all power domains are completely powered up before issuing reset, or a controller may need to wait until the rest of the chip is powered up before booting.”

CDC analysis is made more difficult when power issues are added. “When you take CDC issues and then add a layer of power awareness, you can get into trouble,” explains Joe Hupcey, verification product marketing at Mentor Graphics. “It means that almost everything has to be power-aware. It has become an inescapable requirement given some of these issues. When power circuitry is added, tools must be aware of how that will impact the circuit.”

“Reset is also becoming an increasing concern for a lot of SoC teams,” adds Mentor’s Takara. “Plus there are more complications when you start combining power domains and reset domains and some overlap sections between the domains.”

Real Intent’s Ashar explains that asynchronous reset-related interactions are newer. “Problems are created primarily by warm resets, which are caused by power optimization, interaction between blocks that are initialized by different reset signals, staged initialization to reduce layout overhead, and electrical effects like current spikes and reset-clock interaction during reset release. Failure to properly implement this can lead to metastability, corrupted configuration registers, incorrect initialization and data loss or corruption.”

But there also are some limitationss to keep in mind. “Formal analysis is very effective at proving that a power controller meet its specs and makes all transitions properly,” says Tom Anderson, vice president of marketing for Breker Verification Systems. “But only running realistic use cases while power is changing can verify that the chip’s functionality is unaffected.”

The addition of power domains creates its own set of interfaces issues, such as “ensuring the presence of ESD protection circuit between all possible power-domain pairs,” says Jai Pollayil, director of applications engineering, Ansys-Apache. “Similarly, the introduction of Power Gating greatly reduces the off-state leakage, but introduces additional risks associated with high rush current and noise coupling during wake-up operation. Those peak currents and di/dt introduced during operations like wake-up and scan-shift can lead to voltage drop and package resonance causing chip failures.” Pollayil suggests that chip designer needs to plan turn on sequencing or implement scan chain staggering to make sure that the activity is spread across the time domain to reduce the chance of heavy simultaneous switching.

DVFS can lead to headaches
One low power technique that, according to the Synopsys Global User Survey conducted in 2011, is used by about a third of all designs is Dynamic Voltage and Frequency Scaling (DVFS). This is, according to the industry, a highly problematic technique for which there is inadequate tooling. “The mention of dynamic voltages means that you are characterizing a block at a number of operating points so the total number of corners explodes,” explains Sonic’s Wingard. “It is difficult to get a gate-level design that is place and routed that is somewhat close to optimum for two different operating points. Typically a designer will use the highest operating frequency that they are trying to get to – optimize for that and then try to characterize what frequency it would run at for a different voltage point. There tends to be a lot of rounding down in order to be conservative.”

Even without the implementation issues, design issues abound. “The sequence of states that a block needs to move through to transition from one state to another, particularly in something like DVFS, can be quite complex and can be even more challenging when a transition starts, then has to unwind because of some unexpected event,” points out Atrenta’s Murphy. “Validating across all possibilities is very difficult, even with formal and static techniques, and many designers build in ‘bail out’ options to disable gating if a field use-model starts running into too many problems.”

Where some see problems, others see opportunity. Calypto has been working on ways in which the logic that spans boundaries can be optimized, but with many restrictions. “For power gating, you want to make sure that any signal taken from the power gated region to provide an enable condition for a clock, needs to obey all of the power gating rules,” explains Anand Iyer, director of product marketing at Calypto. “This means that particular regions should not switch more times than the region where the clock is.”

But many of these optimizations do not merge well with DVFS. “If they are doing any DVFS then don’t take any signals from that block,” warns Iyer. “That becomes a don’t touch hierarchy. It is too dangerous and designers have not gone through the verification process or know what will happen to that signal during a change of voltage or frequency.”

Cross interface power optimization


Tool Support and UPF
As fast as we head towards attempts to optimize power consumption in a variety of ways, not all tools are fully power aware. This can create gaps in verification strategies. Privately, Semiconductor Engineering was informed about many aspects of the tool flow that are not power-aware, and concerns were expressed that some groups are dragging their feet in making the necessary changes.

While UPF is quickly adding additional concepts to enable a top-down power management methodology, “most designs are in fact bottom up when it comes to power management,” says Mentor’s Koster. “The most used methodology today is that the RTL team is given UPF by the back-end team to verify. The back-end guys are defining the power infrastructure and then passing it up to the architectural guys rather than the other way around.”

“UPF is a whirlwind standard,” points out Calypto’s Iyer. “We cannot always match all of the specifications within RTL and UPF. There are times when we have to go back to the designer and get clarity on what they wanted to do.”

In addition to extending the existing tools, some additional capabilities are required. “We do not have a good grasp of how to do gate-level simulation in an environment where we are changing the voltage and so by definition, the timing,” admits Koster. “Conceptually, as the voltage changes, we should load a new SDF associated with the new voltage and continue the simulation.”

Synapse Design’s Hingarh adds, “with multiple voltages, libraries may not be characterized at the exact voltage we are using and timing analysis becomes much more complex.”

We have seen many operations that used to be back-end tasks move up to higher levels in the flow. “Restructuring RTL logic hierarchy has become more common,” points out Atrenta’s Murphy. “This is in support of power and also in support of partitioning for connection by abutment (shorter inter-block connections for timing closure and reduced area), which may have an impact inside the IP blocks. This kind of restructuring used to happen in layout, but is pushed increasingly to RTL because of the impact on power rails and power switches, level-shifting and isolation between domains, and there are implications for re-verification in large-scale restructuring.”

There appears to be a long-term problem with all power related methodologies that has to do with accuracy. “The biggest gap in our methodology is that we don’t have accurate early power consumption estimation at the system partitioning stage,” complains Hingarh. “RTL power estimation tools are just starting to be used, but there is little data available on correlation of RTL power with measured silicon results.”

The growing importance of power-aware designs is forcing the EDA industry to retrofit existing tools and add new tools necessary for the implementation of complete flows. These and other design techniques are placing more strain on the interfaces and the boundaries are becoming a lot fuzzier than they have been in the past. Mentor’s Hupcey remains somewhat optimistic: “Once the industry becomes aware of a challenge, the solutions tend to come fairly quickly. We see new challenges coming every quarter these days.”

But will the industry be able to innovate quickly enough to solve the larger problems, especially now that there is less help from startups to pave the way?