Heterogeneous Computing Model Delivers Order-Of-Magnitude Performance Breakthrough

Using GPUs to accelerate circuit simulation technology.

popularity

By Srinivas Kodiyalam (NVIDIA) and Samad Parekh (Synopsys)

With the ever-increasing demand for more computing performance, the HPC industry is moving towards a heterogeneous computing model, where GPUs and CPUs work together to perform general-purpose computing tasks. In this heterogeneous computing model, the GPU serves as an accelerator to the CPU, to offload the CPU and to increase computational efficiency. To exploit this computing model and the massively parallel GPU architecture, the application software will need to be redesigned. Synopsys and NVIDIA engineers have been working together to use GPUs to accelerate circuit simulation technology.

IC design complexity has continued to grow exponentially. Just in the last decade, as process technology has advanced from planar to finFET technology, designs have been hampered by complexities such as very large device counts and interconnect parasitics. For example, comparing a 45nm process technology to a contemporary 5nm technology node, there is around 10x increase each in device counts as well as the number of process, voltage, and temperature corners.

Additionally, as the device and component dimensions decrease, more physical effects need to be considered for accurate simulations. Parasitic effects which may have had a marginal effect in the past have a much more significant impact on overall circuit performance now. During this time, CPU performance gains have largely plateaued while GPU performance has been growing and continues to scale well beyond Moore’s Law. These trends will only further increase the gap between the two computing methodologies over time.

Device model evaluation and matrix solution are the two dominant parts of circuit simulation. Device model evaluation gives rise to a massive number of independent computing tasks. When comparing CPU and GPU architectures, CPUs are designed to handle a wide range of tasks quickly, but they are limited in concurrency. In contrast, GPUs are designed with thousands of process cores running concurrently which increases their throughput performance. Therefore, GPUs have an edge at massively independent parallel tasks. In modern designs with large numbers of transistor counts where each device evaluation is independent of each other, each device instance can be mapped to a GPU thread to run thousands of evaluations in parallel with increasing throughput.

Large post-layout circuits also give rise to large matrices, which require an enormous number of floating-point operations to solve. For example, a matrix dimension of 10M can give rise to more than 100G floating-point operations. CPUs are not built to handle such a large magnitude of floating-point operations, which is another reason for long run-times on simulations. Due to the greatly enhanced computing performance and memory bandwidth in GPUs, it becomes possible to achieve a much more efficient matrix solution utilizing GPUs. The Tesla V100 GPUs, for instance can provide 7 TFLOPS at double-precision.

Synopsys’ PrimeSim Continuum offers a next-generation architecture with unique GPU technology that delivers significant performance improvements needed to perform comprehensive analog and RF design analysis while meeting signoff accuracy requirements. Benchmarks models run on DGX systems with CUDA GPUs show speedups ranging from 4-12X over multi-core CPUs. While performance gains are across the board on various circuit types, the best improvements are seen when running large post-layout simulations. When coupled with long transient run times, the performance improvement is even more noticeable.

PrimeSim achieves its most impressive performance gains by leveraging the massive parallelism of the CUDA GPUs. The core technologies involved are:

  • synchronous parallel computing on a heterogeneous GPU and CPU architecture
  • robust sparse solver for solving the circuit simulation system of equations
  • accurate and efficient IC component modeling
  • compact and efficient data model and management for the GPU, and
  • fast circuit simulation database build and data processing

It is increasingly obvious that increasing nano-scale IC simulation complexity necessitates heterogeneous computing with multiple GPUs with extremely fast interconnections.

Srinivas Kodiyalam is Senior Developer Relations Manager for Industrial HPC and AI at NVIDIA.

Samad Parekh is Senior Staff Product Manager for Custom Design and Physical Verification Group at Synopsys.



Leave a Reply


(Note: This name will be displayed publicly)