Respecting Reset

Reset is one of the most important signals in a design and yet perhaps one of the least respected. What can go wrong and how to correct it.

popularity

Resets are a necessary part of all synchronous designs because they allow them to be brought into a known state. However, such a simple process can lead to many problems within an SoC.

No longer can reset be considered a simple operation when power initially is applied to a circuit. Instead, the design of reset has many implications on cost, area and routability, and it is complicated by multiple clock domains and power islands.

Recently, the reset signal has begun to get some respect within the industry. Tools are emerging to help with the design and verification of reset, along with some emerging optimization strategies. Still, some standards are forcing bad habits to not only persist, but to get worse. Hopefully, that is only a temporary problem based on misunderstandings.

Resetting the Basics
Why does a design need to be reset? “When digital logic powers up, it is not always obvious what state it will power-up into,” explains Drew Wingard, chief technology officer at Sonics. “The purpose of the reset signal is to bring those into a known state. For some parts of a design, it doesn’t matter, such as registers that just hold data values. These are probably initialized through some control step that is part of the boot process. The ones that control the fundamental behavior of state machine tend to need to be reset.”


Fig. 1: Clock, power and reset domains. Source: Ping Yeung and Eugene Mandel, Hardware and Software: Verification and Testing, 2015

But that is not the only reason why a reset is needed. “You need to think of reset in terms of cold resets and warm resets,” points out Adam Sherer, product management group director for the System & Verification Group at Cadence. “There are a lot of designs that go through warm resets when they run into error or questionable conditions. This is especially true of safety critical applications.”

One fundamental design decision is whether to use a synchronous or asynchronous reset. “Synchronous resets affect the state of the register only at the clock edges,” explains Mohit Kumar, senior engineering manager in CSD at Mentor, a Siemens Business. “Asynchronous resets, on the other hand, affect the register output independent of clock edges. Several design issues need to be considered before deciding a reset strategy for the design, such as which reset style to apply, does every register need to be reset, how the reset net will be routed, and how to verify timing of the reset network. The most important concern relates to the verification of design functionality, especially when resets span across multiple clocked partitions. Testing of resets as part of test process also poses interesting challenges.”

Ashish Darbari, director of product management for OneSpin Solutions, puts it into simple terms. “Many people do not even realize what can go wrong with reset. They have become more aware of (CDC) issues, but the interaction of clocks with resets makes the problem even more interesting. Lots of things can wrong from a circuit point of view.”

One critical design decision is which registers the reset should be applied to. “Uninitialized registers can result in erroneous design behavior so designers tend to apply resets on significantly more registers than is actually required,” says Kumar. “RTL coding guidelines may also mandate designers to add resets for all registers in the design, such as required by the ISO 26262 standard for automotive safety. However, this indiscriminate addition of resets on registers has its own penalty. Registers with resets require bigger technology cells and hence increased area and power of the design.”

Timing closure

Timing can make every task more complex. “Electrically the power-on reset is an asynchronous input,” says Wingard. “In the most common design style today, while the assertion of reset is asynchronous, the deassertion is typically managed in a synchronous fashion. In the vast majority of flows, the customer needs to achieve timing closure on the rising edge of reset (deassertion). If you let reset go at the wrong time, you can end up with ambiguous output. This happens if the primary input and clock were changing close to the deassert time. Then you could end up creating exactly what you are trying to avoid.”

And with multiple clocks and resets, care needs to be taken to ensure that the right edges are synchronized. “There is a companion to CDC which is reset domain crossing (RDC),” says Pete Hardee, product management director, System & Verification Group of Cadence. “Some of the checks we do in the CDC app are for resets. When you start having synchronous resets associated with different clocks, or you have asynchronous resets, you can get the same metastability issues caused by the incorrect timing of those. These are the same problems you get with metastability with CDC. Where you need synchronous resets that cross a domain boundary, they need to be synchronized to the new clock.”

Wingard explains some of the issues this creates within the communications infrastructure of a chip: “The amount of distance that a network-on-chip (NoC) spans is large. The reset signals tend to be long. Instead of trying to treat it as a clock, and over-buffer it so that it has minimum propagation delay, fastest edges, etc., that all costs power. We provide options that enable the insertion of retiming stages. While that may sound silly for an asynchronous signal, the de-assertion needs to happen in a clean fashion, so by pipelining it you can make timing closure easier.”

Power domains
Two types of reset have already been identified. The power-on reset and a soft reset which is generally used when an error is detected or a system has gotten into an unknown state. A third type of reset is specific to power domains. “In the last seven or eight years, since became mainstream, we now have multiple power domains,” says Darbari. “The concept of power islands is well known, but the notion of reset domains and the fact that there are multiple resets in an SoC, makes it a bigger problem. When you have FSMs that are powered down, you need a little more careful planning to make sure that reset behaviors are implemented correctly.”

This is complicated by retention. “If you power gate some logic, then you need to reapply power-on reset when the power is brought back,” adds Wingard. “But what if there was retention? If you have retention registers, then you don’t want to apply reset because you wanted to remember the state. Now you have two kinds of power-on resets. So there are times when certain registers do need to be reset and other times when they don’t.”

Again, there are important design decisions to be made here. “You are verifying the reset sequence by checking how quickly Xs fade away,” explains Hardee. “The same is true for retention registers. You need to check that you get a return to normal operation within the required number of cycles. Designers are optimizing the number of resettable registers and also optimize, within a power domain, how many retention registers are required so that the system can be returned to full operation in the desired time.”

“Adding retention affects performance and size of the design and so determining how much and where it is needed is a tradeoff between resetting of power-off blocks and restoring state and these are really system-level decisions,” adds Sherer. “Once you have that, then you need to do the verification. This can be done formally or through system-level simulation.”

There are also some tricks that are learned over time. “When a circuit is power gated, there is a sequence of steps that you go through,” explains Wingard. “Typically, you stop the clocks, you isolate the outputs so that things connected on their inputs do not see an intermediate value and then you turn off the power. When you turn it back on you know you will need to apply the reset. We originally assumed we would assert reset after reapplying power, in the same way as initially powering on the chip. But it turns out that it is lower power to apply the reset as the last step before you power off. As you power back on, reset is already asserted. By doing that you stop it from coming up in some potentially unknown state. Intermediate states can turn on both the pull up and pull down networks which increases current draw, but by having the reset applied when power is restored you can prevent this from happening and save you a bunch of transitions.”

Power draw
The power-on reset has the ability to modify the state of a large part of the design. That means lots of simultaneous transitions. “If a large number of flops in the design have resets, then in the reset phase huge rush current would be drawn from the power source,” points out Kumar. “This is because the flops with resets cannot be gated in the reset phase. This huge rush current can cause serious peak power problems in the chip.”

However, this may not be as serious as if the same thing were to happen during normal operation. “The power-on reset signal it typically applied before power is fully stabilized, so the nodes are being pulled as they are coming up, so the peak current should be fairly low,” explains Wingard. “This is a good reason why you don’t wait until the entire design is up before you hit the reset. Then you would get a larger current spike. The other saving grace is that reset is applied for a number of clock cycles so that whatever voltage integrity problems may happen would have a chance to dissipate before you end the reset. So you are not asking the design to do anything while you are in the reset state.”

Sherer sees it more as a risk than an issue. “If you improperly sequence the system during reset, you could end up with an over power condition. In a mobile device you have to carefully sequence, and you need to manage the power-on for the various components of the system so that you don’t end up with too high an energy draw from the battery. That can cause damage to the circuit.”

Reset verification
The verification of reset has gone through a lot of changes recently and additional automation is being added. Many of the past problems were associated with problems in the simulators themselves. “RTL simulation is known to do optimistic interpretation of Xs and designers tend to rely more on time consuming gate-level simulations to catch the unexpected consequences of having uninitialized registers in the design,” points out Kumar. “On the other hand, gate-level simulations are known to be X-pessimistic.”

Formal verification methods have come to the rescue and can now track how Xs would propagate through a design. “X propagation is the technique that people are using to check how thoroughly and how many cycles it takes to reset a block completely,” says Hardee. “You are looking for the propagation of Xs through the design until they stop propagating anymore, at which point you are reset.”

But that is only one aspect of the problem. Darbari lays out a verification strategy. “When people started looking at reset trees, part of the problem is finding out if it is connected to the IP properly. This starts as a connectivity checking problem and that is now a solved problem. You can run formal and find these issues. What you cannot prove through static connectivity checking is the interplay of synchronous and asynchronous resets across domains. For that you need a combination of static analysis techniques.”

“I would run static analysis tools first to check the rules on the resets,” continues Darbari. “This would catch most of the bad design practices. Then you have X checking to ensure that the right flops are initialized. If the IP has other combinations of behavior, such as complex reset domains, then you will need to do some additional checking and that will require some functional checking using a combination of simulation and formal assertions. You have to split the verification focus depending upon where you are in the design.”

And at the SoC level, that may require verification of the reset sequencing. “Reset and low-power verification is a big and growing problem,” says , CEO of Breker. “Verification intent and/or specification of this type of functionality should be thought of as a state machine. Each state transition must be driven by the test and each state much be checked by the test. Mapping a state machine onto a graph-based model is a simple, mechanical process, and a graph-based Portable Stimulus tool can efficiently cover the available verification space.”

There could also be concerns at the other end of the abstraction chain. “Timing is important, but most reset verification is done at RT level,” says Hardee. “The advantage of doing it in formal is that while we don’t take die-wide timing into consideration, it is effectively a worst-case verification in that if there is a problem related to the order of arrival of signals, then formal will find that.”

Reset optimization
Even when the functionality has been verified, the solution may be overkill. “Resettable flops are costlier than flops that are not resettable, but does every flop need to be resettable?” asks Hardee. “Can we space them correctly and get the Xs cleared from the block in a satisfactory time period?”

Some new techniques are emerging in this space. “Sequential analysis has been deployed to identify redundant resets in the design,” says Kumar. “This is a more formal way of analyzing the design to identify those redundancies. These techniques have been used to obtain up to 22% reduction in sequential power, up to 3% area savings, and reset network load reduction of up to 70%.”

With verification tools now in place and early optimization tools emerging, perhaps reset is starting to get the respect that it deserves. The figures also show that there is a significant savings to be made and that may be enough to offer some product differentiation.

Related Stories
Timing Closure Issues Resurface
Adding more features and more power states is making it harder to design chips at 10nm and 7nm.
Power Challenges At 10nm And Below
Dynamic power density and rising leakage power becoming more problematic at each new node.
New Architectures, Approaches To Speed Up Chips
Metrics for performance are changing at 10nm and 7nm. Speed still matters, but one size doesn’t fit all.
Tech Talk: Timing Closure
Why timing closure is suddenly a problem again and what to do about it.



Leave a Reply


(Note: This name will be displayed publicly)