EDA In The Cloud

Experts at the Table, part 1: While the Cloud may be ready for EDA, it is not clear how ready EDA is for the Cloud. What needs to change?

popularity

Semiconductor Engineering sat down to discuss the migration of EDA tools into the Cloud with Arvind Vel, director of product management at ANSYS; Michal Siwinski, vice president of product management at Cadence; Richard Paw, product marketing manager at DellEMC, Gordon Allan, product manager at Mentor, a Siemens Business; Doug Letcher, president and CEO of Metrics, Tom Anderson, technical marketing consultant for OneSpin Solutions; and Kirvy Teo, vice president of business development at Plunify. What follows are excerpts of that conversation.

SE: In this year’s predictions articles, one of the most talked about subjects was the Cloud. It is clearly making strides. Where are we today and what changed?

Paw: Customers have decided to kick the tires again. This is largely because of their need to quickly bring up verification resources. There was an aborted attempt eight years ago to bring EDA into the cloud. It may have been too early, given that the Cloud did not have much history at that point. Now, the sales organizations are in the Cloud. You have all of the financial data there and there is so much other stuff there that there is a little less fear and a little more history. This is not to say that they will not have the same fear questions that they ran into before, but enough time has passed with enough success history that they are ready to kick the tires again. Also, a lot of the non-EDA IT have Cloud initiatives where the CIO, who comes from outside of EDA, says that you have to have a Cloud strategy – whatever that means. Some of them are just doing what their boss tells them, looking to see if it can be done or not.

Siwinski: There are two more dimensions. Five years ago, you would talk to customers about the Cloud. At the management level they would think it is fantastic, but you talk to the engineers and they have a different take. When you ask what the problems are, they say that they can’t put their designs online, that is too insecure. But their patents, their payroll, all of the IP is already there. So what is the problem? Those two views of the world do not necessarily reconcile with each other. What has changed is that Cloud is no longer taboo. It used to be scary, now it is just the Cloud and the next thing is deep learning and the Cloud is no longer the most new and different thing. That makes it appear safer. The second factor is the growth and complexity of verification. We see Cloud demands across the board but fundamentally, the domains that are very compute intensive tend to drive that more. Verification in general, be it simulation, emulation, or characterization. These are all very compute-intensive. People always need more, and this opens opportunities, which leads to the third thing—a lot of the time we see people ask for peak capacity and peak compute. Before, people thought about Cloud as either on-premises, private cloud or public Cloud. Now the mindset is changing to a hybrid environment. I have my own on premises environment with my licensing, and I have additional needs. How do I augment my needs? Some may move to the Cloud fully. Others will keep everything on premise, some mixed — a need driven by temporary demand. The Cloud provides flexibility and elasticity. That is what is driving demand.

Paw: I see two markets for the Cloud. The one you mentioned before, where you need to burst. That is what the big guys use. The traditional way to handle peak demands is to buy the computer, leave it off and when I need it, I turn it on and get peak licenses etc. Nobody wants to buy computers and leave them around for a year. The other thing is getting them online and up and running takes time. With the Cloud, you can provision it faster than you can get the license from an EDA company. For guys with large datacenters, I don’t think they will ever go fully to the Cloud. It is still more expensive than owning the machine. You don’t rent a car for a year. What is different with EDA workloads is that for the large guys, they are running at 80% capacity, 24/7/365. That is not common. Most companies are around 50% capacity. So for them it makes more sense to rent the time as you need it. For large EDA shops running close to capacity, it doesn’t make as much sense. Peak licenses are more expensive.

Siwinski: That really depends on the pricing model and how the budgets are created. For some companies, they will amortize budgets. They really know when they buy compute and licensing, it is basically peaks and valleys and evens out to a line. So that is a different conversation. For companies that have more limited budgets, or more isolated budgets, the peak and the valley are different decisions. Startups are the same. They might need access right now and they want to deal with the underlying IT spend.

Vel: When you look at the larger corporations, they are running at high capacity, but they are the ones asking how much Cloud capacity is available right now. What is the burst capacity? When they have a project coming online that they don’t know about, suddenly they need another 1,000 cores. How do you provision that within a local farm? How do you do that in a month? That is impossible. Being able to go to the Cloud, with your complete infrastructure set up, with all of the tools set up, is something they all want. As EDA providers, being ready to deploy your product on the Cloud, instantaneously, is what they are looking for. The definition of the Cloud has been changing. Twenty years ago, people used to host websites on premise. Today they don’t. People have become accustomed to host sites on premise and then moving to the Cloud. The same will happen to EDA.

Siwinski: I ask customers, ‘If you are opposed to Cloud, where are your servers?’ They don’t know. ‘Who is running your servers?’ For many it is outsourced.

Allan: The trust model has changed. Apart from the possibility and capabilities and flexibility and bursting, what has changed is the trust model. Cloud providers are invested in their own business. Amazon is successful because of their infrastructure. Google and Alibaba the same. There is more trust in the reliability and uptime and security because we know these vendors depend on it themselves.

Anderson: And they have the experience with the finance and personnel and the IT applications. You talked about the finances from the customer side. From the vendor side we are beginning to figure out business models that make sense for the cloud. We don’t want to take something that used to be sold for $100,000 and have someone pay $100 per hour for a couple of days and be done with it. You have to adjust the thinking. Second, EDA vendors were talking about running on their Cloud five years ago. It turns out that people are more willing to trust Amazon than they are EDA vendors. That is because it is their livelihood that is at stake and they run many other IT applications in the Cloud. Finally, as EDA tools have become better at parallelizing what they do, not just LSFing jobs. That seems to be driving more interest in the Cloud.

Paw: Characterization, simulation, HSPICE – those things have always been parallelizable. Those are the workloads you will see on the cloud. Place and Route – less likely. The large monolithic jobs are suited for the cloud.

Anderson: Also fault simulation, or formal proofs.

Siwinski: I like the peak notion because sometimes you have a situation where a customer is running four chips. One slips, one got accelerated, and all of a sudden the compute capacity is completely consumed. We also see those types of workloads being pushed to the Cloud. But yes, a lot of the time tasks that can consume a huge number of cores. They will push you to buy additional capacity.

Paw: There are some tasks that are embarrassingly parallelizable. Jobs that have nothing to do with each other – completely independent. You can throw it into the Cloud and just run.

Vel: But the EDA industry has to change to achieve that. Today, you have simulation jobs, you have emulation. Some of them can be parallelized, but some of these large monolithic jobs cannot be parallelized. As EDA providers, we are looking at how we can redo those architectures so that they can be parallelized and made scalable and also elastic, where you can deploy on the Cloud on demand as needed. That is where we need to be.

Siwinski: That journey started four or five years ago. Some of the initial products are now Cloud-ready and fully parallelized. Some of us have products that are public, others are in the works. Now it is about taking it to fruition, and we need to get to the point where we don’t care about the invoicing model. We provide technology that provides both.

Letcher: You talked about the trust issue. The Cloud providers have taken a lot more market share themselves. They are become massive providers, and there are only a handful of them now that are capable of running efficient datacenters with 99.9% reliability. It doesn’t make sense to be running a datacenter yourself unless you are the biggest tier 1. And maybe not even then.

Teo: We make tools for FPGA design. We see a lot of machine learning applications. We have monolithic compilation, and place and route is still a still a single process. Vendors are looking at ways to use multiple cores to speed up the process, but it is still monolithic. However, if you run many of these single processes, you get a lot of data and machine learning can help you to learn and converge on a result faster. That is another use-case for the Cloud. You don’t have to be just parallelizing all of these processes to make them faster – you can take all of the monolithic ones, analyze them and help users get the chip faster. This is driving a lot of acceptance. You can also use hardware in the Cloud – GPUs and FPGAs. A lot of people are realizing that if they use the hardware in the Cloud, then why not also use the software there as well. Why split the two?

Paw: It is appropriate to talk about which things should be in the Cloud and what is not. The Cloud is typically a little bit slower than running in your own datacenter. When you have expensive licenses, running in the Cloud may be a luxury that you don’t want, because it takes longer and your license cost per minute doesn’t make sense. There are some jobs that will remain in the datacenter for the big guys. And then there are certain jobs that are Cloudable. For the medium type of guys, the equation may be different. If you don’t have the number of designs that would keep your datacenter full all the time, the economics change and then running on the Cloud, even though it is slower and more expensive per minute, makes more sense.

Siwinski: I see very few people trying to push interactive workloads onto the Cloud. Startups do, but in general it is much less prevalent. It is more of the batch types of jobs. That is the power of the Cloud.

Vel: Fundamentally, the definition of Cloud has been changing. If you had a monolithic software architecture then as long as you could run it in the Cloud, people thought they were Cloud ready. Then we moved from that monolithic architecture to a distributed architecture. Now people look to parallelize stuff. But are we really Cloud ready? Unless we get to the level of elasticity where a user does not have to think or understand what it takes to parallelize their jobs and submit it to the Cloud then we are not there. If he is saying I want to run the job it 6 hours versus 12 hours, I just need to use 2X the number of cores. That should be the only decision he needs to make. That is when we can say we are Cloud ready.

Siwinski: Some of that already exists. Some of that is coming. How much control and flexibility you provide – that depends on the user profile and how they approach it.

Paw: There is also the question about how the tool is architected. Right now, most EDA tools were defined in the ’90s. You talk to the file system as if it is a local disk. This is what almost all EDA tools do. The Cloud doesn’t understand that. File systems do not exist in the Cloud. The Cloud is built around object storage, so until you can get an efficient, low latency file into the Cloud, EDA tools as they exist today are going to suffer from efficiency issues. This means either rearchitecting on the back-end so that you can work with Cloud storage efficiently, or having a scheme where you have a filer in the Cloud or in the datacenter.

Siwinski: Some customers already have private Clouds, so the reality is that even for on premises computing, the Cloud dimension has already been coming into the EDA suppliers to drive a different generation of architectural readiness. So this is not completely new, but has not propagated through all of the tools yet.
Paw: People are working on ways to address that.

Allan: You are describing the default case, the 1c per GB per month model for default Cloud data, but as you engineer the storage in your own datacenter to be suitable for EDA, then you also have to do the same in the Cloud. Increasingly we see the Cloud as a distribution model. They are there to sell compute resources and storage resource. They may also sell EDA resource where the customer chooses I want compute, compute, storage, storage, storage++, EDA, Veloce – whatever they need to populate in their virtual private network and rent that as service.
Paw: That is part of the evolution that has to happen. As things stand today, there are quite a few gaps.

Vel: But if you look at the way in which new architectures are evolving in the EDA space, these architectures have existed in the non-EDA space for over a decade. Look at Hadoop as a Big Data system – they exist. But in EDA, have we leveraged that? Are we in a position to say that we have leveraged Big Data architectures where we don’t have to think about solving these huge multi-billion node matrices in a distributed fashion, where we have services that are solving matrices versus graphs, versus geometric services – all of these are available on a service basis as opposed to a tool basis. The moment you can distribute that at the service level in a Hadoop type of architecture, you don’t have to think about what the tool does – it will elastically scale, it will take the resources that it needs and come back with a result. You don’t have to worry about bandwidth, you don’t have to worry about the number of cores that you are using. That is where we need to be.

Letcher: That is more than the Cloud. That is software as a service and for the small and medium sized customers, what they want is something that they can log into, use it for simulation. That must become transparent. We need a GitHub equivalent from the software world. That doesn’t exist in the hardware world right now because you have to worry about expensive licenses and how you manage the data. We have to make that transparent and just work simply for a smaller company who otherwise would have a large investment in the IT aspect. We talked about the machines themselves being cheaper and more reliable in the Cloud but there is also making the Cloud easy to use without having to build it yourself.

Anderson: That has implications for the architecture of the tools and the infrastructure around the tools.

Vel: As EDA providers, we need to hold ourselves to a higher standard. You have Facebook and Google and other companies who are making strides in software architectures, whereas when our customers come to us and they buy software – it is basically old architecture. We should be advancing and using new software architectures that are scalable and can run in the Cloud.
Siwinski: We have started, and there are some products that do leverage that quite well. And yes, there is more that needs to be done.

Allan: We are eating our own dog food. We are doing a lot of our own R&D work using Cloud resources for our own development. That helps us to flush out the ideas and the value of specialized storage solutions for example.

Related Stories
Verification In The Cloud
Is the semiconductor industry finally ready for EDA as a service?
Follow The Moving Money
How economic considerations are affecting designs at advanced nodes and across geographies.
Which Verification Engine?
Experts at the Table, part 2: The real value of multiple verification engines; cloud-based verification gains some footing, particularly with internal clouds.



4 comments

Jerome McFarland says:

Regarding the lack of performance file systems in the cloud…

That’s certainly been true historically, but it’s something that the company I work for (Elastifile) has now rectified. We have IC design firms running production EDA workloads on our high-performance, distributed file system in the public cloud TODAY. Rearchitecting the backends of complex EDA tools isn’t a viable solution, so practical, cloud-integrated EDA relies on a cloud-based file system.

Kevin Cameron says:

OpenAFS has been around for a while and can probably handle the Cloud-to-Edge as well as anything else, it’s not really a lack of technology slowing things down.

Evan says:

EDA companies needs to put their money where their mouths are. Where are the ready-made AMIs containing relevant tools? Why aren’t docker / vagrant files containing everything I need to run tool x (except the binary and the license) publicly available? Why has this group languished with no activity for 3 years now: https://www.linkedin.com/pulse/welcome-eda-containers-jason-andrews ?

I really can’t figure out if this is just due to incompetence, or if they are purposely handicapping external cloud adoption in order to push their own lackluster cloud offerings. (These are doomed to fail, if they haven’t already.)

Kevin Cameron says:

The Cloud works against the licensing model of the big EDA companies, so they don’t really want it to happen. However, there is a long list of dysfunctional practices and deficient tools that need replaced so it seems inevitable it will happen sometime. Open source tools like Xyce possibly open the door to that future – a decent extraction tool and a pile of AI can probably do the rest…

Leave a Reply