Early Detection Of Reset Domain Crossing Errors

Common design errors involving resets and how to address them.


Many aspects of system-on-chip (SoC) designs are growing, including the numbers of gates, memories, clock domains, reset domains, power domains, on-chip buses, and external interfaces. A recent blog post focused on reset domain crossings (RDCs) and the requirements for effective pre-silicon verification of these trouble-prone structures. If properly applied, a solution meeting these requirements can find RDC errors early in the development process, when they are easier to diagnose and fix. This is an important component of the overall “shift left” in chip verification to save project resources and reduce time to market (TTM).

To demonstrate the value of RDC checks, it is helpful to look at some common design errors involving resets. Many of these cannot be easily fixed in silicon, which is another key reason to find them during verification. An RDC occurs whenever a signal traverses from one reset domain to another. Handling these signals properly is challenging and mistakes are easy to make. An RDC error occurs whenever the application of an asynchronous reset can result in metastability and consequent non-deterministic values in the chip. Although RDCs have some similarities with clock domain crossings (CDCs), the scenarios in which they occur may be different.

For example, an RDC error can happen even within a single clock domain. In the following figure, a signal from a flip-flop in one reset domain drives a flip-flop in another domain whose reset operates independently, although both flip-flips share a common clock. If the flip-flop that is reset (FF_src) changes value close to the active clock edge of the capturing flip-flop (FF_dst), the capturing flip-flop may go into a metastable state, with its output indeterminate. This is because paths going from an asynchronous pin (such as the red path in the figure) typically are not timed. The “rst1” signal may be a software reset or may be due to a power-on reset (PoR).

Exhaustive static analysis is the fastest and most reliable way to uncover RDC errors. A few scenarios of bugs found in actual SoC designs demonstrate the power and flexibility of this analysis. In the first scenario, data corruption can occur when an SoC memory controller is reset but the associated memory is not because its contents must be preserved. For example, configuration memory contains important operational information that must not be cleared by reset. The RDC design error is that the asynchronous memory controller reset (“soft_rst”) controlling flip-flop f1 may cause flip-flop f2 to become metastable, potentially generating spurious writes to the memory and corrupting the critical configuration data. The simplest fix is to gate the write (and read) enables during controller reset to ensure that the memory will not be changed.

Integrating design IP blocks from previous projects, internal libraries, or commercial sources often adds reset domains to the SoC, and with them the potential for RDC errors. In the figure below, the bus logic in the “Generated Reset” domain transfers data to an IP block in the “PoR” domain. The design is set up so that a PoR automatically asserts the generated reset, but not vice versa. Thus, if the generated reset is asserted without PoR, there is an RDC error, and data going into the IP block may be corrupted. The solution is to connect the IP to a lower level reset that is active whenever the generated reset is asserted.

A related scenario occurs when a slave IP block using “User Reset” sends requested data to a master IP block in the “System Reset” domain. If the slave is reset while the master is not, there is the potential for metastability and corruption of the data. One possible fix is to disable the data path from the slave IP to the master IP whenever “User Reset” is asserted so that any dubious values are ignored.

Detecting these types of RDC errors—and many others—requires robust and powerful static analysis technology. Synopsys VC SpyGlass RDC is the leading industry solution to provide these capabilities, with all the listed scenarios taken from bugs found in real user designs. Multiple reset domains are especially common in low power, high performance designs, and the solution supports specification using the Unified Power Format (UPF) and debug in the industry-standard Verdi Automated Debug System. Catching RDC errors early in the development process with an exhaustive approach saves time and resources while providing certainty that no issues will escape to silicon.

For more information on finding and fixing reset issues, read the previous post and download a detailed white paper.

Leave a Reply

(Note: This name will be displayed publicly)