Systems & Design
SPONSOR BLOG

Next-Generation Distributed Static Timing Analysis On The Cloud

Parallelism is the only scalable solution for making any application faster.

popularity

Ever-growing chip size and complexity put pressure on every step and every electronic design automation (EDA) tool in the development flow. More decisions must be made at the architectural stage, stressing virtual prototypes and high-level models. Simulations become slower and consume more memory. Formal verification struggles to achieve full proofs. Logic synthesis and layout have a harder time meeting power, performance, and area (PPA) goals. A wide range of analysis and sign-off tools takes much longer to complete, often delaying tapeout when multiple runs are needed to resolve issues. None of this is surprising, but the magnitude of the ripple effects from larger chips grows with each new generation and design node.

For many years, chip developers could rely on regular increases in performance due to Moore’s Law and the increasing speed of the compute platforms running EDA tools. This is no longer the case; the slowing of Moore’s Law means that other techniques must be used. Of course, clever tool developers are always coming up with enhanced algorithms and better data structures to improve performance and reduce memory consumption. But, ultimately, parallelism is the only scalable solution for making any application faster. Today, many steps in the chip development flow can operate across multiple cores, multiple CPUs, or even multiple compute servers on the cloud. Progress was initially slow, but in recent years the pace of EDA parallelization has accelerated dramatically.

Despite all this innovation and improvement, some tasks in the flow have proven much harder to distribute. Static timing analysis (STA) is one of these outliers, and it’s not hard to see why. Many steps in chip development have natural parallelism that can be exploited by tool developers. For example, simulation is inherently parallel in nature. Assertions tend to use local variables so that formal analysis can often run on only a small subset of the entire chip design. Hierarchical logic synthesis and place-and-route can be distributed, and many analysis tools for the resulting layout can run on portions of the chip in parallel. But the paths traced in STA may span the entire chip, making partitioning the design for parallel analysis extremely challenging. Traditionally, full-chip and signoff STA is run on a flat netlist of the entire design. On a large chip, this takes too long and can run only on a very expensive compute server with maximum memory installed. As with other EDA tools, partitioning and parallelism are the only alternative to satisfy the demands of today’s designs.

Any solution for distributed STA in the cloud must satisfy several key requirements:

  • Design partitioning must be automatic, requiring no manual action or intervention
  • The distributed STA jobs must use the same constraint and setup files as full-chip STA
  • The distributed STA jobs must optimize runtime to reduce time for analysis and signoff
  • The distributed STA jobs must run efficiently in existing cloud compute servers
  • There must be no compromise in accuracy, especially for final timing signoff

Although these requirements set a high bar, fortunately there is solution satisfying them available today. The HyperGrid technology within the Synopsys PrimeTime static timing analysis tool provides full-quality timing signoff in the cloud, with 100% accuracy when compared to traditional flat netlist STA. It achieves maximum parallelism through fully automatic fine-grained partitioning and distribution of the STA jobs across servers in the cloud. This approach yields a speedup of up to 10x and a memory reduction of up to 10x with no loss of accuracy. In addition to accelerating STA and signoff, parallelism can save significant money since it is less expensive to run short jobs on multiple smaller servers in the cloud rather than one long run on a single server with maximum memory.

Ease of use enables rapid and painless adoption of this novel technology. Automated partitioning and load balancing eliminate any need for users to manually break up the design or even to provide guidance. Synopsys PrimeTime HyperGrid uses the identical constraint and setup files already in place for traditional flat full-chip Synopsys PrimeTime STA. It generates identical timing reports, so all scripts and post-processing steps continue to work with no modifications needed. Synopsys provides all the support needed to perform distributed STA on a public cloud, including building containers for both Docker and Singularity job packaging, and using job schedulers such as Sun Grid Engine (SGE), Slurm Workload Manager, and OpenPBS. Recently, Synopsys also announced the FlexEDA pay-per-use business model to run Synopsys PrimeTime on-demand.

Users simply specify the number of design partitions they would like, and Synopsys PrimeTime HyperGrid does the rest. The results, as reported by hands-on engineers at recent Synopsys Users Group (SNUG) events, are impressive. One company designing leading-edge artificial intelligence (AI) chips used Synopsys PrimeTime HyperGrid on a design with 1.2B instances. They reported that the time for each STA run was reduced by 5X with 8 partitions, and by 9X with 20 partitions, over a single flat netlist. For these same two partitioning choices, memory was reduced by 3X and 4X, respectively. This saved significant project resources and made rapid STA iterations possible.

The dramatic speedup on each STA run is reflected in the overall project results as well. Another company presenting at a SNUG event reported that each STA run was reduced by 44%, for roughly a 2X gain in performance, and that the total time spent on STA in the project schedule was reduced by a staggering 30%. This number speaks for itself. Synopsys PrimeTime with its unique HyperGrid technology provides truly parallel, distributed STA in the cloud and delivers high value to teams developing today’s most advanced semiconductor devices.



Leave a Reply


(Note: This name will be displayed publicly)