Algorithms written for GPUs can slice simulation time from weeks to hours, but not everything is optimized or benefits equally.
Optimizing EDA hardware for the cloud can shorten the time required for large and complex simulations, but not all workloads will benefit equally, and much more can be done to improve those that can.
Tens of thousands of GPUs and specialized accelerators, all working in parallel, add significant and elastic compute horsepower for complex designs. That allows design teams to explore various architectures with whatever resources are needed at any moment, and to make adjustments based upon targeted workloads and applications.
This approach has proved especially useful for accelerating general matrix-matrix multiplication (GEMM) operations involving large-scale linear algebra calculations. “CPU to GPU is a huge leap,” said Rajath Narasimha, principal product manager for cloud HPC at Keysight. “The need for GPUs exists if everybody wants their simulations — which can take two weeks — to run in two hours. But if you dig a couple layers deeper, you quickly understand the algorithms for simulations don’t absolutely support GPUs. The way GPU compute happens is most of the workloads can be parallelized, and most of them underneath the hood are parallelized by matrix multiplication.”
Simulators are serial, meaning the stage for the inputs for each stage is the output from the previous stage. This sequential nature often limits a chip company’s ability to fully leverage the parallel processing power of GPUs, as they are optimized, by default, for tasks that can be broken down and processed simultaneously. To address this, there is a growing focus on re-engineering traditional simulation algorithms to bridge the gap between serial simulation requirements and the parallel processing capabilities of GPUs.
“The only way you can parallelize them is through distributed computing,” Narasimha explained. “If you have three points from zero to a hundred, you can split those individual three points, but that’s not to say you can send each of three points to a GPU and make it faster. Think of it as a big ball when you’re trying to move it faster. It has momentum, but it can’t be broken down. If you can individually break out this ball into smaller balls, meaning one does only data processing, one does the sweep, one does the analysis, then you are able to send these out to the GPU. The majority of EDA vendors are working to reorient their monolithic architecture to a microservices architecture. That way they can take full advantage of GPUs in the future.”
Although there is still much optimization work to be done across the industry, on-demand access to the latest GPUs for simulations is a key factor driving chip companies to the cloud. Synopsys PrimeSim Continuum, for example, uses NVIDIA A100 GPUs to improve time-to-results by 10X compared to simulations running on CPUs alone, reducing run times from days or weeks to just hours.
Fig. 1: Bar chart demonstrating the acceleration of simulation workloads with cloud-based GPU resources. Source: Synopsys
“There are many workloads that do much better on GPUs — simulation-type workloads, for example,” said Vikram Bhatia, head of cloud product management and GTM strategy at Synopsys. “Not many customers have GPU racks on premises, so typically they will do all sorts of GPU type of jobs on the cloud.”
Heavy compute-intensive workloads are typically first in line to take advantage of GPUs, according to Mahesh Turaga, vice president of business development for cloud at Cadence. “Computational fluid dynamics (CFD) simulations are ideal for GPUs. You have high memory requirements, much longer run times, and many more iterations that are happening within the solvers to converge the solutions. GPUs are really good at parallelizing, and give 10X performance compared to CPUs.”
Fig. 2: CFD software visualizes aerodynamic flow and density distribution around an aircraft model, highlighting the simulation capabilities for analyzing fluid dynamics and performance in aeronautical design. The software achieves 9X throughput increase on GPUs. Source: Cadence
Fine-tuning EDA cloud strategies: Memory, collaboration, cost
Looking beyond GPUs, understanding the specific hardware requirements of different cloud-based EDA tools and workloads is essential for both semiconductor companies and cloud providers. “Some tools require more memory, more reach, or better, faster interconnects between the nodes,” said Turaga. “Optimizing at least these three parameters and coming up with the right instance — say, one that is particularly good for front-end workloads versus backend — is important. That’s what all these cloud providers are doing, which results in better price performance overall. You’re able to run those tools much faster and at a better price point.”
Others agree. “If you look at the cost of cloud-based compute, you’ll see the price scales more with memory than it does with core count,” said Craig Johnson, vice president of cloud solutions at Siemens EDA. “That’s because the memory is quite expensive. Anything we can do to shrink the footprint of the compute is going to be cost savings in a cloud context.”
Of course, memory requirements for verification and simulation vary based on workload size and category. “Formal verification or implementation workloads have larger datasets and typically require more memory,” said Mark Galbraith, vice president of productivity engineering at Arm. “The cloud is very suitable for that. When you’re running a project that can ramp up and down over time, you’re not having to kit out your own data center, saying, ‘We think we’re always going to run verification like this.’ A cloud implementation allows you to ramp up and ramp down as needed for a range of EDA workloads, depending on whether it’s running verification, simulation-based verification, or backend implementation type of work.”
Galbraith pointed to a recent verification cycle for one of Arm’s products that highlighted the advantages of cloud-based EDA scalability. “It was a physical design and we were constrained for some of the verification. To put some numbers on it, we were running 4,000 slots for a few months. We couldn’t increase the capacity there because we’ve got so many other demands on our [on-premises] infrastructure. We found that when we took the workload into cloud, these constraints were removed and we could scale up to 20,000 slots. So rather than 4,000 slots running over several months, we’re now at 20,000 slots and we can then run that all in parallel. It’s like a 5X improvement in time for that particular workload.”
Cloud-based EDA significantly improves agility by removing the constraints of fixed, on-premises capacity. “The cloud allows you to run multiple projects or multiple milestones in parallel,” Galbraith continued. “Normally you get all these peaks and troughs from the different capacity needs of your teams. The cloud unlocks efficiency for different workloads with different machines and specifications—you can get the right number of machines at the right time. There’s also the performance per dollar and performance per milliwatt. We’ve been using Arm-based processors in the cloud and found benefits in terms of performance per dollar compared to other architectures.”
Moreover, running EDA-based workloads in the cloud enables Arm engineers to optimize collaboration and accelerate time to market. “Because you deliberately design your environment with access controls in mind for the cloud, it makes that collaboration side easier,” Galbraith added. “Often when companies have an on-prem environment, they may not have it defined in as tight a way. Cloud really enables you to put all those controls in place so you can see what’s needed for an individual activity, your organization, or multiple organizations. It makes access and control much easier.”
Streamlining verification, analysis
AMD likewise has been leveraging the benefits of the cloud for functional verification and SoC analysis. “These workflows lend themselves well to cloud enablement as they have good data handoffs to make their inputs available in a cloud-hosted data center,” said Philip Steinke, AMD fellow. “Enabling those flows for cloud has also made it easier to provide test cases to partners. We kept designer productivity at the forefront of our priorities in our cloud enablement, so the experience for our teams is very similar to the collaborative approach they already enjoyed on-prem, now with the added compute capacity that we can get from our cloud partners.”
While cloud compute hasn’t changed AMD’s planned schedules, it has provided higher confidence on meeting major milestones in the design flow without impacting other programs. “Without cloud compute, when an unexpected demand for compute comes up, tradeoffs would be necessary to figure out which project would get its compute allocation reduced,” Steinke said. “Now, if priority is sufficiently high to justify the spend, additional compute can be added, allowing those unplanned activities to go ahead without any squeeze on other projects. It’s hard to quantify, but a rough guess would be 5% overall reduction [in design time] by not having those disruptions.”
Cloud computing also has shortened time-to-results for cloud verification workloads. “Our biggest benefit here comes from taking advantage of the latest AMD EPYC Genoa-X CPUs, which are available on cloud,” Steinke added. “We see value in cloud for rapid access to new and higher volumes of hardware, both of which translate to faster time-to-results. On-premises costs also become larger when we hit our floorspace footprint and have to add on costs of building out or leasing additional data center capacity.”
Hybrid EDA: Balancing on-premises and cloud computing
Major chip companies such as AMD and Arm have successfully deployed hybrid EDA models that combine on-premises data centers with the unlimited computing resources of the cloud. This approach allows them to locally manage certain workloads and tools, while bursting to the cloud when additional compute power is required.
“A hybrid environment might be the best of both worlds where you have your own computers that are always utilized to the degree that you can,” Johnson said. “When you run into a situation where that’s not sufficient, you’d like to take advantage of cloud-based computing, and there are different degrees of intervention that would be required by an engineer to take advantage of that cloud compute. The way that customers accomplish this hybrid environment now is that they have job schedulers that allocate the jobs to the available compute. The compute that’s underneath the hood is mostly on-prem infrastructure. They can also set up environments of their own in the cloud that have their own job schedulers, as well, and those jobs can be sent to a queue and distributed into cloud use cases. We find most people want to manage that a little more carefully at this point.”
Ultimately, resources will be intelligently allocated with EDA tools, rather than engineers, managing where and how processes run. As Bhatia noted, EDA engineers want to focus on their core competency, which is designing chips. Fortunately, intelligent handoffs from on-premises infrastructure to cloud resources and vice versa are on the horizon.
“We’re not very far from it,” Bhatia said. “There are customers who are doing this in their own unique ways today, but it is not yet a seamless, complete solution. Typically, customers break up the job or the workload and say: ‘Let me run this part of the job on cloud, that part of the job [on-premises].’ Data is generated in the output, data is generated in the cloud, they bring it back, they merge the data, and do verification for the entire set. Then they either go back, because they found too many bugs and repeat the whole process, or go to the next stage because they’re satisfied with the results. So it’s happening today, but it’s happening piecemeal. What we want is a very seamless solution.”
Although completely seamless handoffs are still pending, many chip companies view the cloud as a flexible, effective extension of existing, on-premises data centers that allows engineers to contract, expand, and shift workloads as needed. “When I think about how we look at the different workloads, we look at where the hardware’s available to allow us to then make the decisions on what we run and where,” Galbraith said. “You architect your solution around where that availability is, and that availability is often specific workloads or machines that are well-suited to a given set of workloads. We’ve got some that map nicely into cloud because of the nature of the available hardware. We also see some workloads suited for on-premises because of the hardware that’s available, and that’s when we think of the workloads requiring FPGAs and emulation. We keep those workloads on-premises.”
Related Reading
IC Tool Vendors Eye Cloud-Native Future
Current design algorithms are hitting their stride in the cloud, but they’ll eventually be replaced by cloud-optimized approaches.
The Good And Bad Of Chip Design On Cloud
Designing chips with the help of the cloud is progressing, but users still want greater flexibility in tools licensing and other issues.
Leave a Reply