Accelerating Circuit Simulation 10x With GPUs

Increasingly large and complex SoCs put pressure on traditional CPU-based circuit simulation.


By Samad Parekh (Synopsys) and Srinivas Kodiyalam (NVIDIA)

Many aspects of semiconductor design and verification have an ever-growing “need for speed” that has outpaced the performance improvements available by running on CPUs. Electronic design automation (EDA) companies have responded by creating smarter software algorithms to improve simulation time, sometimes at the expense of relaxed accuracy.

As graphics processing units (GPUs) have moved beyond rendering graphics and video into more conventional computing tasks, circuit simulation has emerged as an area of EDA that has proven amenable to speed-up with GPUs without compromising accuracy. The design hyperconvergence embodied in today’s huge system-on-chip (SoC) devices has created massive challenges for traditional CPU-based circuit simulation. Larger and more complex circuits, the reduced margins and increased parasitics of advanced process nodes, integrated analog content, larger and faster embedded memory, and complex high-speed I/O have all put pressure on simulation speed and capacity. With wider adoption of 3D IC technology, the situation is only getting more difficult.

Several attributes of GPUs make them a good fit for circuit simulation. The many parallel computing cores enable thousands of threads, most of which can be scheduled independently, supporting concurrent streams. The large, dedicated memory fosters simulation data structures that improve bandwidth and latency. Many circuit simulation problems entail solving dense matrices, a process that works well in the GPU architecture. The heterogenous combination of one or more GPUs and a cluster of parallel CPUs has been shown to speed up simulation by an order of magnitude over CPUs alone. These results have been obtained across a variety of circuit types (PLL, SerDes, SRAM, PHY, etc.) with device counts in the hundreds of millions of elements.

A recent case study looked at the performance gains achieved by running the Synopsys PrimeSim simulator on the NVIDIA Volta V100 and Ampere A100 GPUs. The A100, the most recent NVIDIA GPU (2020), offers many advantages over its predecessor V100 (2017), resulting in better performance for many applications. The Tensor Core architecture supports 64-bit floating-point (FP64) double-precision data types and operations, accelerating general matrix-matrix multiplication (GEMM). These capabilities are one of the keys for achieving better performance for circuit simulation.

The case study looked at the results of running eight different designs in SPICE-level simulations on CPUs only, on V100 GPUs, and on A100 GPUs. In summary, speedups as much as 10x were measured for V100 over CPU-only and as much as 2.4x for A100 over V100. The different designs included up to 7M transistors, 29M resistors and 90M capacitors.

Additional case studies have shown that the powerful combination of Synopsys PrimeSim and NVIDIA GPUs also accelerates other troublesome circuit simulations, including some that were previously unsolvable. One example is simulating the analog-to-digital converters (ADCs) in the columns of a CMOS image sensor. Even a small voltage drop between the column ADCs can cause image distortion, so it is important to catch any problems before the design is fabricated.

This is far from a trivial problem to solve. In the case study design, the image sensor contains an array of 48M pixels, arranged as columns and rows where thousands of column ADCs must be simulated together. The resulting circuit contains millions of transistors and billions of resistors and capacitors. Despite the high pixel count and the large size of the circuit, the SPICE simulation must be highly accurate to detect small voltage differences. The combination of size and accuracy proved impossible for CPUs to solve without the help of NVIDIA GPUs. The simulation became possible using four V100 GPUs along with 32 CPUs and was accelerated by almost 2x when four A100 CPUs were used.

Another example is running FastSPICE circuit simulations on a memory array with a power delivery network (PDN). The goal is to analyze the voltage drop impact on memory timing and switching currents. As with the previous example, it is essential to find any problems before the design goes to silicon so that they can be fixed without an expensive chip turn. The FastSPICE simulation must be repeated for thousands of vectors and corners, putting a premium on performance, while transistors in the millions and resistors and capacitors in the hundreds of millions stress capacity.

It turns out that this particular circuit simulation challenge is especially suited for the CPU/GPU heterogenous combination. Using PrimeSim with a V100 GPU provides a 5x improvement over CPUs and passes the threshold to make this simulation practical to repeat across PVT (process, voltage, temperature) corners and test vectors. For the first time, memory vendors are capable of accurately verifying the impact of PDN voltage drop on memory array performance.

These examples show clearly that GPUs provide an excellent solution to the growing challenges to circuit simulation posed by the advances in semiconductor process technology and increase in circuit complexity. The V100 GPU yields up to a 10x speedup over traditional CPU-only simulation and solves some circuits impossible for CPUs alone. The newer and more powerful A100 GPU provides double the performance of the V100 in some cases and runs on average 50% faster. The combination of Synopsys PrimeSim simulation and NVIDIA computing platforms provides the power needed for the next generation of circuit simulation.

Srinivas Kodiyalam is a senior developer relations manager for Industrial HPC and AI at NVIDIA.

Leave a Reply

(Note: This name will be displayed publicly)