Systems & Design
SPONSOR BLOG

Cloud Characterization

Harnessing the scalability of cloud platforms to speed up library characterization.

popularity

Library characterization is a compute-intensive task that takes days to weeks to complete. Runtimes for library characterization are increasing due to larger library sizes, higher number of operating conditions to characterize, as well as the need for statistical variation modeling in libraries at 22/20nm and smaller process nodes. Cloud platforms offer a way to accelerate library characterization significantly.

In addition to turnaround time improvements enabled with more CPUs available, cloud platforms provide more flexibility compared to on-premises compute clusters, as well as more uniformity in the compute resources available, making runtimes more predictable. Overall, cloud characterization can reduce full library characterization turnaround times from days to hours.

There are several factors to take into account when establishing a cloud-deployed library characterization flow. Aside from tool readiness and scalability, cloud configuration factors such as machine types and storage methods can impact overall runtimes. EDA partnerships with cloud vendors can help customers ramp up quickly by providing optimal cloud configurations and cloud-tested characterization flows.

Library characterization runtime and compute resource challenges
The past year has seen increased interest in running standard cell or memory library characterization on cloud platforms. This is not surprising, since characterization runs for modern libraries may take days to weeks to complete, even when using compute clusters spanning several hundred CPUs. The long CPU runtime and compute resources required for library characterization can be attributed to a few main reasons:

Number of cells or memory configurations. Modern standard cell libraries can contain up to thousands of standard cells. In addition to the baseline set of combinational and sequential cells, today’s libraries also include multi-bit flops, power management cells, and other specialized, complex-operation cells. The large number of standard cells, multiplied by all the table data required for each cell results in a large number of simulations needed to produce all results. For memories, characterization involves many different combinations of memory configurations which also leads to more simulation time.

Number of operating conditions to characterize. Each cell or memory instance has to be characterized over many operating conditions for process, voltage, and temperature (PVTs). New application spaces and chip architectures require reliable operation over a larger range of operating voltages, including ultra-low voltage settings which typically are more prone to timing variation. For more advanced process technologies such as 7nm and 5nm, the number of PVT corners to be characterized can be as high as 100-200, due to factors such as transistor behavior, metal properties, and process manufacturing variability.

Statistical variation modeling. LVF (Liberty Variation Format) for variation modeling has become a practical necessity, especially for libraries at 22/20nm and below. In order to model variation, each data point in the timing model requires the equivalent of thousands of Monte Carlo SPICE simulations to obtain characterized results. Most characterization tools today apply approximations to reduce the number of simulations needed, but still require 5X-10X more runtime compared to nominal value characterization.

For library characterization teams, this means compute resources and performance have become bottlenecks in the process of delivering characterized libraries on time for production schedules. Since the compute resources needed for library characterization are significant, expanding compute clusters for library characterization requires a large investment and careful planning, especially if the need for library characterization compute resources only happens in sparse intervals.

Benefits of running library characterization on the cloud
Since most of library characterization runtime is spent running millions of SPICE simulations that have little or no inter-dependency to each other, library characterization tasks are considered mostly uncoupled. This provides an opportunity to scale well with a large number of CPUs, making library characterization highly suitable for cloud deployment. Compared to on-premises compute clusters, cloud platforms can provide several benefits:

Turnaround time improvement is the most obvious benefit cloud characterization can bring. Due to the potential for high CPU scalability, running characterization on cloud platforms with large numbers of CPUs (e.g. 10,000 CPUs or more) is a viable method of drastically speeding up high-priority characterization tasks. This, in addition to the ability to rapidly allocate and deallocate virtual machines on the cloud, makes it possible for library teams to “burst” through high-priority characterization tasks with a large number of CPUs when needed. Library characterization jobs that normally take days with on-premises compute clusters, can be completed in a matter of hours. This not only speeds up the delivery schedule, but allows library teams to react quicker to incremental characterization needs.

Flexibility is another benefit cloud platforms bring to library characterization. On-premises compute resources have upgrade and expansion cycles that require significant planning and lead-time. Once set in place, on-premises compute resources are relatively rigid until the next upgrade/expansion cycle. On the other hand, cloud users may change resource types almost instantaneously. For example, if design teams find that the current machine type does not offer enough RAM per CPU core, they can swap in a new resource type for the next run easily. CPU type, storage type and amount of RAM are just a few factors that may be changed without disrupting production schedules or incurring large hardware purchase costs. This allows library teams to fine tune compute cluster configuration to suit their needs.

Performance predictability is a benefit for library characterization teams. Many library teams share on-premises compute resources with other teams, with little control over exclusive usage of CPU time. Most engineers have experienced situations where dispatched jobs take longer than usual to complete, due to other tasks utilizing CPU cycles on the same machine. In addition, jobs requiring a large number of CPUs may end up being executed on machines of different specifications, depending on how the job schedulers are configured. The ability of cloud platforms to provide large quantities of machines of the same specification reserved for a specific task can be helpful to eliminate these variables that may impact runtime performance, resulting in much more predictable runtimes.

Cloud characterization flow considerations


Figure 1: Example cloud characterization configuration.

One of the main aspects of setting up a cloud characterization flow (Figure 1) is the type of virtual machine, storage and data transfer mechanism to use. Typically, compute-optimized or memory-optimized machines are used as SPICE simulation grid/worker machines, which form the main bulk of execution nodes (and computing cost) for characterization jobs.

For storage, there are several options available from cloud providers, including different NFS and parallel file systems, as well as caching software to speed up disk access. Aside from characterization software being optimized for file access, it is also important that the storage system bandwidth adequately supports file access needs; otherwise it may become the bottleneck that slows down characterization jobs.

While this may seem like a lot of choices to make in order to fully benefit from cloud characterization, the good news is that most of these choices can be mapped out for library characterization teams. Aside from readiness and scalability testing on the cloud, Mentor also works with cloud providers to identify optimal configurations for cloud characterization, so that cloud characterization customers can benefit fully from large scale parallelization in the cloud, as well as shorten the ramp up time to deploying effectively.

Mentor Library Characterization Platform and AFS/Eldo on the cloud


Figure 2: Mentor Library Characterization platform and AFS/Eldo running on the cloud provides burst capacity required for library characterization-driven AMS verification workloads.

Mentor’s Library Characterization Platform working together with AFS and Eldo SPICE simulators, provides a high-performance, high-throughput library characterization and validation solution for library teams (Figure 2). Mentor works closely with major cloud providers to test and ensure cloud-readiness and scalability.



Leave a Reply


(Note: This name will be displayed publicly)