中文 English

How To Justify A Data Center

Tipping the scale in favor of a massive on-premise compute farm is becoming more difficult.

popularity

The breadth of cloud capabilities and improvements in cost and licensing structures is prompting chipmakers to consider offloading at least some of their design work into the cloud.

Cloud is a viable business today for semiconductor design. Over the past decade, the interest in moving to cloud computing has grown from an idea that was fun to talk about — but which no one was serious about implementing — to a trustworthy, secure, and increasingly attractive option for many compute applications.

“Some very large companies now discuss cloud as an integral part of their IC or system development strategy, and this has accelerated over the last five years,” said Craig Johnson, vice president for EDA cloud solutions at Siemens EDA. “In the early 2000s, there began to be discussions about what wasn’t yet called ‘the cloud,’ but would turn into the cloud. It was the idea of vast numbers of resources that were easy to access for client-server applications. A decade later that eventually turned into public cloud.”

Even then, it was obvious that early cloud infrastructure was not sufficient to handle EDA workloads. “There were also security concerns,” said Johnson. “But over the last 10 years Azure, AWS, and Google made investments to address security and infrastructure for high-performance computing, opening up the possibility of doing IC design in the cloud, or portions of a system design in the cloud.”

So why do chipmakers still have their own data centers? It depends upon who you ask, when you ask, and where they sit in the design flow.

“The vision for the cloud data center is obviously that you want everyone to move over to your services,” said Steven Woo, fellow and distinguished inventor at Rambus. “There are really good reasons for it. The economies of scale are such that it makes a lot of sense. At the same time, there are still reasonable cases where people may want to operate their own data centers. It’s all a cost/benefit type of decision-making process.”

For startups, almost everything is done in the cloud that can’t be done on a laptop. And even for larger chip companies, deciding whether to move to the cloud may depend upon how much of the investment in a data center has been fully depreciated, how heavily it is utilized and how often, and the licensing structure.

“Who needs their own data center? Not many people,” observed Rupert Baines, chief marketing officer at Codasip. “You need a very good reason to argue why you should. The whole logic of the cloud is everything’s abstracted and software has largely moved. It’s in the cloud somewhere, you don’t know where. Why would you invest in your own? The companies that invest in private data centers today are probably the same class of people who want their own power station. If you look back to 1908, when Henry Ford was building the Model T, that was incredibly vertically integrated. He had his own power station. He had his own coal mine. He had his own cattle ranch to grow the leather to make the seats. I can’t imagine anyone doing those things these days. The cost of a data center, just in terms of what it takes, is akin to what Warren Buffett says if someone has a private plane. You’ve got to ask very hard questions about why. Having your own data center is probably 100 times more extreme than that.”

Chip design companies have been slow to move, though, in part because not all of the EDA tools have been ported to the cloud. But sooner or later, that is likely to change.

“It’s ludicrous that you have a roomful of emulators or a roomful of servers to run your synthesis tools because they don’t want to run in the cloud,” Baines said. “That’s an anomaly. But nearly everything else – Salesforce.com and all your business apps — are hosted by Amazon or Azure or whoever. That’s an example of horizontal utilization and de-layering, and the business processes are separated.”

Cloud security no longer a concern
Still, there are some intangible considerations for an engineering organization to move to the cloud.

“If you have extreme security requirements, or you have some requirement for data to be located in a certain place, then it makes a lot of sense to operate your own,” said Rambus’ Woo. “But more and more, at least for the companies that feel like they have to have their own data center presence, you’re still seeing companies like that shift some of their usage to cloud data centers, as well. It will take more time for people to be completely comfortable with the whole idea. Fortunately, we are seeing more U.S. government contracts and the like for general cloud computing. This means the security levels are going up.”

While more people are becoming comfortable with putting their IT infrastructure in a shared environment to some degree, factors such as data sensitivity can limit what they actually are willing to put in the cloud. Some of that has to do with regulatory requirements that indicate an organization must maintain very close control of the data assets, particularly where personally identifiable information or otherwise protected information is concerned.

Sandeep Mehndiratta, vice president, Enterprise Go-To-Market & Cloud at Synopsys, recalls his first conversation with a customer regarding cloud computing more than a decade ago. “The company was very gung ho about cloud, but when the rubber hit the road, they didn’t move then due to security concerns. It’s a different story today, and that company started transitioning to public cloud about a year ago. Same customer, same people. Today, very few customers are bothered by the security, because what has happened in the last five years is everything else has moved to cloud — ERP, CRM, etc. From the perspective of our business personas as well as personal personas, we as human beings are comfortable with the information sitting on cloud now.”

Semiconductor industry sensitivity to IP protection is higher, but this is no longer an insurmountable concern,” Mehndiratta said.

For this reason, the only users he sees not moving to cloud wholesale are the extra-large companies for the simple reason of economics. “The economies of scale that a customer can get from a data center they run, or a couple of data centers, is pretty sizable. In a case like this, these are higher-end machines because they are running high-performance workloads,” he said.

Other situation that don’t warrant moving to the cloud are in light of those economies of scale.

“If you have a really efficient and optimized system, cloud cannot compete,” he said. “If you’re a large company, you’ve got this scale, and you’re utilizing that data center efficiently because you have multiple projects, the cost of operation on cloud is going to be much higher.”

At the same time, these semiconductor and systems companies also are moving to the cloud in specific situations.

“When they have workloads that burst, such as any kind of verification, electrical verification, functional verification, power verification, physical verification, characterization — those workloads are bursty,” Mehndiratta explained. “When there are multiple people in the company queuing up their jobs and waiting for the machines to be available, these same larger customers are looking at cloud as a way to manage 75% to 80% of the capacity on their data center, and questioning whether to move to a model where 20% of their utilization is operating-expense-based, because it’s cloud where they can burst out and burst back in. That’s with the large- to extra-large companies.”

While there are some key criteria to the overall decision, such as cost and time-to-market advantages, there are many aspects beyond the obvious that engineering groups must take into consideration when looking to move to the cloud, Johnson noted.

“This is definitely segmented,” he said. “Large IC companies, categorically speaking, don’t have 100% of the work in the cloud. They’re the ones with the biggest legacy investments in data centers, so they’re not the ones leading the all-cloud use model for ICs. That tends to happen a little bit more at the startup level, or small companies, maybe even some medium size companies if they don’t have all of the investment and IT expertise that is required to set up and manage a data center, and all the computers in it. They’re the ones that are most inclined to start off in the cloud, and are said to be ‘born in the cloud,’ as opposed to transitioning from on-prem to cloud.”

Capabilities versus economics
The process for deciding where and how to compute is complex, but the top decision points are capability and economics, said Ketan Joshi, business development group director for Cloud at Cadence.

“Five years ago, the capabilities in a public cloud weren’t as completely available as you would have in the on-prem facility, but that’s no longer the case,” Joshi said. “There is a huge variety of compute, storage, and security services that may not have been there before that are available today, from NFS kinds of services, to scheduling, queuing, you name it. Pretty much every such service is now available in the public cloud. The question then becomes, ‘What capabilities should I be marrying between my on-prem and cloud?’ For instance, for a user who has plenty of machines on-prem that wants to do, say, a library characterization for the latest process nodes, the library characterization literally can take thousands of machines to get it done. Very few users have the resources on-prem to do that. Let’s say this user needed almost 10,000 CPU cores to do the characterization, and they needed it done in a matter of weeks. This is where cloud capabilities can come into play, to access the specific capacity and the kind of scale needed, and which the on-prem didn’t have.”

That changes the economic formula for when to use the cloud. “Day in, day out, as users go through the typical process of calculating the on-prem costs of servers, storage, security, data center costs, etc., they try to compare it with what they can get, for example, in a public cloud or a co-location facility,” said Joshi. “For some companies there is a clear answer that it’s much cheaper to go with public cloud, because either the utilization of on-prem isn’t good enough, and isn’t cost-effective. In that situation, it’s a clear economics answer. In other cases, some users can say their utilization is really good or they’ve already invested in a large group of machines, which have yet to be depreciated. There, they are better off keeping most of the workload on-prem. That’s one reason why some people are keeping on-prem. Only when the capability is needed, then they go to public cloud. That’s a combination that may be considered because of economics, less so about capability.”

Fig. 1: Different computing models for chip design. Source: Cadence

Fig. 1: Different computing models for chip design. Source: Cadence

Decision-making behavior varies by company size, Siemens EDA’s Johnson noted. “Using a large IC company as an example here, they are the ones that have large data center investments today, but they also have evolving needs with regard to the projects. As has been the case for 50 years, every new process node brings more complexity. So by moving ahead, you end up requiring more compute and more EDA capability just to keep pace with your project. That type of customer tends to have to ask themselves a couple of questions. One is, what is their level of trust in a cloud environment? They have to convince themselves that their data is going to be secure, that the access is secure, and that’s a big question.”

The next question concerns raw cost, Johnson said. “Cost is always front and center, and that’s where they independently need to do what the IT world does almost all the time, which is a TCO (total cost of ownership) analysis. They will get out their spreadsheets and break down, with as a fine degree of granularity as possible, the relative expense of doing it the way they’re doing it today on-prem and try to see what the equivalent would be in the cloud. A big part of TCO analysis will boil down to that utilization level of the infrastructure. If their utilization levels are high, it will tend to tell them the cloud is going to be more expensive than continuing to do what they do.”

But the ultimate decision for many large chip design organizations is not so straightforward. “They must weigh what they can’t do on-prem, but which they need and would like to do. The reason that it’s not just a cost-only decision is that there are opportunities to complete tasks more quickly, and to have more flexibility in where their projects are done, or the size of teams in different locations by leveraging the cloud,” he said.

Hybrid approaches
For many organizations, the answer appears to be some type of hybrid approach, which allows them to fully utilize their on-prem data centers, while also providing the option for scaling up and down as needed.

“That will give the optimal cost for the investment they’ve made,” Johnson said. “Then, they’ll utilize the cloud for the peaks and the unexpected surprises that come along in the course of a project. They’ll pay for that compute as they go, because it will be unpredictable, and that will allow them to do something they’ve never had the ability to do, which is trade off project time for expense dollars.”

For the stakeholders on the IT side, this starts with the TCO argument. “Once the TCO is completed, then you’ve got a good feel for where you stand relative to the cost of the cloud,” he said. “But then you also need to do an ROI analysis, because the investment to incrementally do some of the workload in the cloud can have a pretty high return in terms of financial benefit. But it’s not just the expense benefit, which is easier for companies to point to. It’s really that intangible of how much more valuable it was to get the product completed two months earlier, vetted for whatever sales cycle it is. Whether it’s a holiday intercept, or a back to school, that’s the part that requires some strategic thought. How do you go to your TCO and then add on to that an ROI, and ask if you really do need to invest? It’s not that it’s going to be a balloon squeeze of one type of cost to the other. You need to add a little more air in the balloon to get the bigger scale. Cost is always easier to model to the right number of decimal points, versus projections on revenue benefit or the value of being able to have performed more verification on a chip. You’ll never know that you actually eliminated what would have been a really expensive error because you didn’t ever experience it. You never have the hard data to say, ‘See, spending that extra $100,000 worked out because it saved us $20 million.'”

When to revisit the on-prem versus cloud decision
A good time to revisit the cloud vs. on-premises analysis is after resources are fully depreciated, because the cloud offerings are improving. Over the next three to five years, Mehndiratta believes cloud providers will sweeten the offering for large and midsize customers through dedicated high-performance computing applications optimized for storage, compute, access, business models, and location. That, in turn, will shrink the delta between on-prem and cloud.

At the same time, there might still be two or three companies that are so big that they run their own private clouds, Mehndiratta said. “But for medium to large semiconductor systems companies, if you look to those core companies that are ASIC houses, analog, RF specialists, even platform companies, they are headed toward a 50/50 model, and asking, ‘When the hardware gets depreciated, do I look at cloud as a fully supplement capacity because I’m going to have to buy new hardware anyway?’ It’s not just the economics of the per-core mainstream machines. It’s also access to the latest and greatest hardware. ‘I don’t have to buy them. I can get access but for slots of time.’”

This provides a level of flexibility they didn’t have previously. If the cloud provider can give access to the specialized processing that is needed, that flexibility can be extremely valuable.

“When you take the same theory and apply it to smaller companies or new startups, they have a born-in-the-cloud mindset, Mehndiratta said. “If I’m a startup in AI/ML, networking a new, whiz-bang Arm-based processor application, first, I’ve got to be well funded because I can’t do that complex a design without deep pockets. And second, why would I invest in IT or CAD? They are born in the cloud. These are the companies saying, ‘I just want somebody else to do it for me. I know it’s going to cost more, but I have a time-to-market problem.’”

Data center and cloud-based tools have been growing in popularity for EDA tools for both hardware and software engineering projects, said Simon Davidmann, CEO of Imperas. “At the same time, not all accelerators are equal, and given the wide range of application and targeted data sets, the debate on the merits to keep a data center private is not over yet. In fact, the data center accelerator market is fast becoming a target for new designs and innovation, as we see in the growing number of customers focused in this space.”

The move to the cloud is no surprise to most, but Rambus’ Woo says there is still the question of whether they can check all the boxes that are really critical for every business. “At least so far, there are still people running their own data centers, but the shift to the cloud has been much faster than a lot of people have suspected it would be.”

Roland Jancke, design methodology head at Fraunhofer IIS’ Engineering of Adaptive Systems Division, has wrestled with these issues internally. “We have a high performance cluster in our institute, but if we buy it next year, it’s out of date,” he said. “We would have to replace it each and every year. And if you want compute resources, the cloud will never age. It will always be the latest performance.”

On the other hand, he said the licensing model still needs work. “Some of the EDA providers are already working on that, but it is not yet completely solved — except for open-source tools, then you can of course bring them into the cloud. But not all of the tools are open source,” he explained. “We do have a first glimpse of how this would become in the future with quantum computers. While it is two steps ahead, at the moment one minute costs €10,000, it’s quite expensive to think about. But it does require you to think more thoroughly about what you really want to simulate. You would not go for a trial and error, because it’s very expensive.”



Leave a Reply


(Note: This name will be displayed publicly)