Optimize Physical Verification Cost Of Ownership

Minimize unused compute resources during DRC and LVS runs.

popularity

As semiconductor designs continue to grow in size and complexity, they put increasing pressure on every stage of the design process. Physical verification, often on the critical path to tape-out, is especially affected. Design rule checking (DRC), layout versus schematic (LVS), and other physical verification runs take longer as chip size increases. In addition, finer geometries introduce new complexity and require more design rules to be verified. Thousands of complex rules, some with hundreds of discrete steps, are not uncommon. Fortunately, DRC and LVS can be parallelized and executed on multiple CPUs or cores at the same time. Design teams routinely take advantage of this capability by running jobs on multiple hosts in private farms and cloud computing environments.

This approach works very well and is often scalable. Adding more CPUs to a physical verification run may decrease the turnaround time (TAT) and shorten the project schedule. However, this scalability is limited. Jobs can only be parallelized so far before serial dependencies prevent further speed-up. In this case, adding more CPU resources to the run no longer reduces TAT. It is unlikely that designers can predict this tipping point since that would require deep knowledge of the foundry rules, the way the rules are implemented and the details of how the rules interact with the data.

Designers are highly motivated to optimize resource usage since physical verification jobs are compute intensive. A DRC or LVS job for a leading-edge design, with billions of transistors, can run for multiple days using many hundreds of CPU cores. Designers want the shortest TAT since time spent waiting for results is time that can’t be spent debugging those results. However, allocating CPUs that won’t help has real costs to the project. In a local server farm, those CPUs can’t be used for other tasks. In a cloud environment, users end up paying by-the-minute rates for resources that are not used.

Additionally, maximum parallelism is likely possible only for a portion of the overall run. At other times during the run, some of the CPUs sit idle. In many cases, all the compute resources requested by the job are not available immediately. This means that the run cannot start until all required CPUs are available, often delaying TAT well beyond the actual execution time for physical verification. In addition to compute resources, there are costs associated with tool licenses that are allocated but unused at times or never used at all.

What’s needed is an automated way to minimize cost by minimizing unused resources. Synopsys IC Validator provides just such a solution with its elastic CPU management capability. IC Validator is a modern physical verification tool, architected for massive and efficient distributed processing. It understands the serial dependencies in a job, the current resources and the job command queue. It uses this information to identify when adding more compute resources will make the run finish faster and to release resources that are not currently needed.

Elastic CPU management optimizes compute and license resources for DRC and LVS runs in three ways:

  • There is no need to wait for resources; a job can start with available CPUs and additional CPUs are added as they become available
  • CPUs are added only when analysis predicts they can be used to increase the parallelism of the job, so allocation is optimized
  • CPUs are removed and freed for other tasks when they are not currently contributing to the parallelism of the job

CPUs are allocated and de-allocated “on the fly” throughout the run, so this approach is elastic indeed. The following example shows the results of IC Validator elastic CPU performing LVS checking of a 5nm design. The user has specified that this LVS job uses no more than 120 CPUs but, due to serial dependencies, many of these are idle in the latter half of the run. IC Validator recognized these dependencies and automatically added CPUs when needed and released CPUs when not utilized.

The overhead to analyze and manage the resources was minimal, so the elastic job finished in only 5% more run time than the traditional job with all 120 CPUs allocated up-front. The blue lines on the graphs show the CPU allocation over time, so the total cost for compute resources on each job is the area under the blue line. The elastic job used less than half (43.5%) of the CPU hours required by the traditional jobs. This resulted in immediate and direct savings in a cloud environment, and the opportunity to run other tasks in parallel with LVS in a server farm with a fixed number of CPUs.

IC Validator elastic CPU management optimizes compute resources for physical verification jobs. Runs start and finish sooner, reducing TAT and leaving more time for users to debug violations. This accelerates physical verification closure and shrinks the overall project schedule. For more details and examples, download the full white paper.



Leave a Reply


(Note: This name will be displayed publicly)