Is Cloud Computing Suitable for Chip Design?

Semiconductor design lags behind other industries in adopting the cloud, but there could be some good reasons for that. Change is difficult.

popularity

Is semiconductor design being left behind in a cloud-dominated world? Finance, CRM, office applications and many other sectors have made the switch to a cloud-based computing environment, but the EDA industry and its users have hardly started the migration. Are EDA needs and concerns that different from everyone else? We are starting to see announcements from EDA companies, but few cheerleaders are ready to announce that they do design in the cloud.

Some trailblazers are emerging. “Our transition started about four years ago when Jack Harding, our CEO, asked what we are gaining by having compute hardware,'” recalls Naidu Annamaneni, vice president of global IT for eSilicon. “At that time we transitioned to data center as a service. For one chip, we needed to expand that by 2X. There was no way that the data center provider could do that, and so that is when we really challenged the team to find a way to use the cloud. We are marching toward being completely in the cloud by 2020, and we will be a serverless company. We will not own any hardware.”

So far, eSilicon appears to be the exception rather than the rule. “There is a lot of interest, but in general, people are still figuring it out,” says Doug Letcher, president for Metrics. “Most of them have on-site infrastructure that is their legacy. There is a time aspect to it while they figure out what to do with their old stuff.”

Shiv Sikand, executive vice president at IC Manage, agrees. “In making the transition, given the existing heavy investment in on-premise EDA solutions, we expect that for the next several years the majority of companies adopting EDA cloud solutions will deploy a hybrid cloud environment that combines cloud and on-premise workflows.”

So what are the benefits to be gained by a migration to the cloud and how do you maximize performance?

Elasticity
Compute elasticity is probably the biggest draw. “At the highest level, the customers’ environments that we deal with have variable demand and fixed supply,” states Carl Siva, vice president of information technology for cloud solutions at Cadence. “With the cloud you introduce variable supply, with the ability to access the cloud provider’s capacity that they have invested into. This is the kind of money that most customers do not have.”

Compute requirements over time. Source: Cadence.
Fig 1. Compute requirements over time. Source: Cadence.

Such demand cannot even be met by data center as a service. “We are dealing with bubble demand, where the compute resources needed can double during tape-out,” says Mike Gianfagna, vice of marketing for eSilicon. “You need it for two months, and you need it in two weeks. There is no way that a static data center can respond to that demand, so, we were forced to look at the cloud where you can ask for something today and get it today.”

Either way, there are costs. “You have to look at the capital investments, the budget models and ask, ‘Am I prepared to access this variable capacity?’ It is dollars from a different bucket—operational dollars, and reduced capex, or doing some combination of those two,” said Siva.

It all comes down to expected utilization of the life of the resource. “If I have 5 servers, but I need 20 for 1 week or month, that is what makes the cloud attractive,” says Kirvy Teo, COO for Plunify. “But for day-to-day processing, that is not as attractive because of the costs.”

Cloud pricing
To know what makes sense requires a certain level of financial modeling. The two primary pieces to this are cloud pricing and tool pricing. Within cloud pricing, there are various ways in which the cloud providers are charging for their services. These are a combination of compute, memory, disk costs and bandwidth used.

“The biggest unknown in terms of costs that people fail to understand is bandwidth,” warns Teo. “AWS is smart. They provide very cheap storage, cheap CPU power, but bandwidth costs a lot. Putting stuff up is free. Downloading is 10 cents for 1GB. Now imagine for EDA tools how big the files are. They are huge. So if you look at our bills, 30% of the cost is bandwidth.”

The amount of bandwidth that is required is related to the applications put in the cloud and the methodology used. “If I take Cadence’s experience, bandwidth doesn’t tend to be a high percentage of the bill,” says Siva. “More of the cost comes from the storage and compute resources. So while there is a tax or tariff, it is not seen to be substantial as a percentage of the overall spend. But it does affect the way a design team would interact with the cloud. You wouldn’t want to have thousands of transactions every second between AWS and your design center. You want to look at ways to compress or optimize the transfer of data between the two sites.”

The more you know, the more prepared you can be. “We learned the hard way, and today we only send limited data sets that are needed to run the task,” says eSilicon’s Annamaneni. “We send limited datasets, run the tools and locally store all of the results for a period of time. We only push the minimum data needed on demand. That is a small customization that we had to do for our workflow.”

Some companies are retooling their products to enable more of a flow to happen natively in the cloud. “For cost reasons, it makes sense to leave most of the data in the cloud,” says Letcher. “That is why we provide native web-based applications. You log in and see data, but you never transfer it to your local space.”

Vendors are also looking at ways to reduce file sizes. “We are trying to bypass file creation altogether,” says Jean-Marie Brunet, director of marketing for emulation at Mentor, a Siemens Business. “Emulation files are huge. Tools should talk directly via an API so that they are concurrently processing the data. Dumping to disk uses memory and you are charged for transport. To prepare for cost-effective usage in the cloud, you need tools that can talk to each other and process data without dumping to disk.”

Another question is how stable cloud pricing is likely to be. “We have used the cloud for some number of years,” says Michael White, director of product marketing for Mentor’s Calibre. “AWS approached us because we had tools that scaled well and could take advantage of large numbers of CPUs. Because they were a new entrant trying to get EDA and the IC industry into the cloud, they were offering spot market pricing that was very attractive. We ran a series of experiments using DRC and OPC in the cloud and got to the point where we could get OPC scaling to 10,000 CPUs and using the hardware effectively. DRC was also up to one or two thousand CPUs. It looked attractive. Fast forward to the past 18 months. For the types of hardware that we would want to use, the pricing has gone up. We have now shifted away from doing that and have acquired more hardware to add to our internal cloud and growing capacity that way.”

Others are not so worried. “Cloud computing is part of a utility,” says Gianfagna. “It has to be viewed that way and we need to feel comfortable that we can burst to Google or Azure or AWS. Once you have that environment, then supply and demand and competition will ensure that things do not get crazy cost-wise. There is fierce competition. If we did something that would lock us in to Google, that is probably a bad idea.”

Cloud providers still provide attractive deals to get people hooked. “They are asking us to commit, for a discount, but we don’t want to get locked in yet,” adds Annamaneni. “There is still competition between the vendors and so before we commit to anyone, we will ensure that the next 5 years, the cost is going down, not up.”

Cadence’s Siva agrees. “At the moment it is still a race to zero from the providers side. Their scale allows them to more efficiently run a datacenter than the majority of companies that are not providers. While they do charge an overhead for those services, they are competing against each other and until they gain more market share. I don’t know if anyone can truly answer the question of whether they will change their pricing. For the foreseeable future, they will continue to be competitive and gain customer’s on-prem environments and go after new markets.”

Tool pricing
The EDA industry is still grappling with pricing for the cloud. “Licensing and pricing for cloud-based tools is very much an evolving topic,” says Tom Anderson, technical marketing consultant for OneSpin Solutions. “A task that might take weeks to run on a small server farm might execute in hours in a massively parallel cloud, but it still costs the vendor the same amount of money to develop and support the tools involved. Users can get higher quality results, reduce infrastructure costs and accommodate peak tool usage while maximizing their verification engineering resources. The business arrangements must reflect this value while being reasonable for both vendors and users.”

One area that is migrating faster are tools and flows for FPGAs. “Unless EDA vendors can find a good way to monetize it, most companies would have difficulty adapting to a SaaS model,” says Teo. “That is less of a problem in the FPGA field.”

“Nobody within the big EDA companies has really jumped into SaaS where I just go to AWS and have an app sitting there, or I run and pay by the hour,” says Mentor’s White. “Most companies maintain their core license models just because the customers understand them, and we as a corporation understand the business model behind those. So nobody has really changed their licensing model.”

EDA is providing a certain degree of license flexibility. “We are doing a certain amount of work and there are a known number of licenses we will always be consuming,” says Gianfagna. “We buy that baseline and then you have a peak. You can contact the EDA vendors and buy a peak license for short-term usage. You can get more licenses fast, so you can expand and contract a little easier with software licenses. So this is not a showstopper. It can be managed.”

A similar situation exists with emulation. “If a customer is above a certain threshold of utilization, doing it off premise, be it either hosted or cloud, it will cost them much more than on-premise,” says Brunet. “That threshold is very similar from customer to customer and is also dependent on the vertical market they are involved with. If a customer is doing a lot of regressions, their usage model tends to be very high—in the 90%-plus area. For them to use the cloud would be a very expensive proposition. Big companies have done this analysis, and they did it when deciding if they wanted to build a data center in-house or move it out.”

Others are betting that licensing will change to be more on-demand. “We provide a SaaS model,” says Letcher. “Everything is set up. You just log in and it is ready to use. Within that model, the farm of regression machines that you have access to is dynamically flexible. The cloud is basically renting machines, and you have to pay for them. With the Metrics model, there is the flexibility to use those machines and licenses on demand.”

“Vendors must move to per minute licensing to match the cloud pricing model,” asserts IC Manage’s Sikand. “The cloud is elastic and if the licenses aren’t elastic, the model will fail.”

Performance
How does cloud performance stack up against in-house data centers? “If you are running on old hardware, you do not get the benefits of the latest advances,” points out Annamaneni. “With cloud you can always get that. When we ran our benchmarks, comparing their virtual servers to bare metal, there is about 15% to 30% better performance from Google. While using virtual machines (VM) takes a small performance hit, their clock speeds and other performance optimization, better cache, outperformed the overhead that a VM had.”

Speed isn’t the only advantage, though, or necessarily even the best selling point.”Reducing turnaround time is sometimes not enough to convince someone to use the cloud,” adds Teo. “If you compare a cloud machine and a local machine, you often find it will be faster to use a local machine. It is also more cost effective because you are not renting a car, you are buying it and amortizing it over three years, so simply running faster is not convincing enough.”

In many cases, performance is dependent on application. “When it comes to emulation in the cloud, there are three steps,” says Brunet. “The design is compiled, you run something, then debug. The key is latency. This means that the customer has to understand what they want to do and what they are trying to accelerate. If it takes three days to transfer a design database to the cloud and it takes two hours to run on the emulator, then six days to get results back for debug, the value proposition is not there.”

Conclusion
It appears that companies are looking to dip their toes, but not yet ready to fully submerge. “Customers are now in a state of mind that cloud will play a role in their silicon development over the next one to ten years,” says Craig Johnson, vice president of cloud business development at Cadence. “Who knows how long it will be before the majority are on the cloud. But we see that journey as something that is good for them to begin now.”

Others are not so sure. “The cloud has been around for a number of years and we do not have all of the big fabless companies clamoring to get to the cloud,” says Mentor’s White. “They do not see it changing their world.”



2 comments

Richard Paw says:

When I managed the compute platform strategy at Synopsys, I was mostly skeptical of EDA moving to the cloud. Over the last few years, I’ve seen the cloud and the industry mature and now I think there are benefits. However, I think the benefits for semiconductor design in the cloud are complicated. There’s no one answer for everyone.

For some organizations, the benefit will be to make sure that your EDA licenses are fully utilized. Licenses are much more valuable than compute, so if you wind up spending a little more to make sure your licenses aren’t idle and are running on the fastest machines, you probably come out ahead. You want to be license constrained.

Mike Gianfagna mentioned tape-out. For many organizations, tape-out can be disruptive. Not only does IT have to marshal enough machines for the extra load, but other groups are sometimes de-prioritized from the farm to accommodate the extra load. Bursting to the cloud could help alleviate some of this.

Bursting tape-out also bring up other possibilities. Customers often obtain extra licenses for tape-out. Depending on the critical path in your verification, you could potentially get twice the number of licenses for half the time. To the EDA vendor, it’s still the same total amount of seats/time, but if you could access twice the number of systems, you could run through the verification workload much faster and either catch issues sooner and/or have a little more time to address them.

The cloud also gives the opportunity to access machines you may not have in house. If you need a couple of 2 TB ram systems for a particularly difficult run, you can access those types of systems in the cloud without having to buy them and wait for delivery and provisioning.

When machines learning enabled EDA tools become generally available, they may need compute that isn’t already common in EDA centric data centers. If these tools need GPUs, FPGAs, etc. many of those are already available on the cloud.

I recently joined Rescale as their semiconductor industry principal because they’ve already solved many of these and other deployment issues (cloud security, provisioning, management, tool orchestration, etc.) for successfully bringing adjacent and similar HPC workflows to the cloud. I don’t foresee many people in semiconductor going all in to the cloud, but a gradual motion is likely.

David Marshall says:

In EDA the old amortization models of amortizing hardware over 3 years no longer fits for my company..The FinFET designs from last year stress the compute resources requiring considerable new hardware to meet time to market or customer commitments. The largest demand is during the “tapeout” process. The estimation is that next year this will be worse.

The Cloud bursting we have done avoided considerable 3 year commitments for the short term “tapeout” need.

Leave a Reply


(Note: This name will be displayed publicly)