Rethinking The Cloud

As the volume of data increases exponentially, companies are changing their approach to where data gets processed, what gets moved, and focusing on the total price for moving and storing that data.


Data center architectures have seen very few radical changes since the commercial introduction of the IBM System/360 mainframe in 1964. There have been incremental improvements in speed and throughput over the years, with a move to a client/server model in the 1990s, but from a high level this is still an environment where data is processed and stored centrally and accessed globally.

The first whiff that substantive change would be necessary came in the early 2000s. The racks of inexpensive blade servers that had been installed en masse in the 1990s turned out to be rather expensive to power and cool. That wasn’t obvious until corporations made IT departments directly responsible for data center utility costs. Realizing just how expensive data centers had become, they simultaneously clamped a lid on IT budget increases.

This created almost universal panic among CIOs, who in turn leaned on server makers to improve energy efficiency. The server makers leaned on the processor makers, and within a couple years chipmakers were advertising performance plus lower power. Even that wasn’t sufficient because the servers themselves were underutilized, so virtualization became the next quick fix. That not only improved utilization rates, it allowed data centers to turn off entire portions of a server farm when servers weren’t use.

Still, as massive amounts of data from smart devices and the Internet of Things begins to flood into data centers, it’s becoming apparent that even more fundamental changes will be required. The IoT has grabbed headlines primarily in the consumer world, with smart watches or Nest thermostats or smart TVs, but the biggest changes will happen on the big data side—how data is collected, used and processed and stored. Unlike the consumer electronics market, where products are being created at a rapid pace to test the consumers’ appetite, when it comes to big data there is a clear business case for adoption. If energy costs can be sufficiently chopped to accommodate a rapid increase in the overall amount of data being processed and stored, then companies will pay for it.

“If you’re running a data center, the worst day of the month is when the electricity bill shows up,” said ARM CEO Simon Segars, who proposed changes that extend far beyond the data center. “The network needs to become an intelligent flexible cloud so there is no longer a distinction between the network and the data center, and computing and storage need to occur closer to the edge. A couple megabytes of code there can save a terabyte at the data center.”

Segars noted that data center architectures are essentially a straight line from the client device to the network to the data center, and have been that way for “a very long time.” He said a better approach is to add intelligence throughout the network so processing can be done where it makes the most sense—improving energy efficiency, decreasing latency and only using those resources that are required.

“The distinction here is between data and information,” he added. “You produce lots of data, but you want to store information.”

To make this work requires work on every facet of the information processing and delivery process. In Segars’ words, “it’s a journey,” and it won’t happen overnight. Just as the Internet of Things is a collection of connected ecosystems, each part of that ecosystem needs to be fully involved. Large companies such as Facebook, Google and Amazon, have taken a very public lead with their own unique architectures, which are focused on reducing power while increasing throughput. Less obvious are changes that are underway as you dig deeper into the stack of hardware, software and networking technology.

“One of the big challenges is how do you develop services that have distributed computation,” said Edward Lee, professor at University of California at Berkeley and an IEEE fellow. “We need to be able to deploy computation to put computation close to the physical device.”

The upshot is that we will have many more devices, but much more energy-efficient architectures into which they are deployed. That relieves at least some of the burden of data centers. Numbers vary greatly on just how much energy data centers consume, but the rate of growth may be more telling. The Natural Resources Defense Council said that in 2013 U.S. data centers consumed 91 billion KwH of electricity, with that number expected to reach 140 billion KwH by 2020—a 54% increase. The NRDC noted that most of that energy is consumed by small and midsize data centers, not the most advanced and largest ones. That leaves much room for improvement across a wide number of markets, and it creates enormous opportunity for the electronics industry.

Elemental changes
The biggest energy savings for all of this clearly will come from an intelligent parsing of data processing, but there will be many other changes involved, as well, which makes this shift complex, multi-dimensional and to some extent all-encompassing. For the Internet of Things and the Industrial Internet of Things (IIoT) to live up to their promise, all of these changes need to be put in place.

“The use case for all of this is less about millions of applications and more about millions of users of one application,” said Larry Wikelius, director of ecosystems and partner enabling at Cavium. “This is a combination of compute, I/O, virtualization and accelerators. In the past, you’d see a Web server processing big data, and historically you’d cobble a bunch of things together. What’s new is that this needs to be configurable by choice—compute, storage, networking and security—so you can optimize for the mix based upon the workload.”

Wikelius said that intensive floating-point calculation may be critical to an operation, but it might only account for a small percentage of the entire workload. To balance the workload companies increasingly are looking at network functions virtualization (NFV), which can be used to customize a network in a way that is analogous to using virtualization to run multiple applications and operating systems on a generic server.

A slew of companies, ranging from Cavium at the chip level to HP on the server side have made NFV a central part of their strategy to revamp the data center. Typically these involve many-core architectures to maximize throughput and processing capabilities using a minimum amount of energy. But it’s just one piece of the giant puzzle that needs to be completed.

“You can think of this as rightsizing,” said Lakshmi Mandyam, director of server systems and ecosystems at ARM. “Due to a lack of choice things have been overprovisioned.”

Mandyam said compute power per square foot has to increase for data center efficiency to improve, but at the same time the explosion of data traffic can’t all end up in the data center. No matter how much server and network efficiency is increased, it will never be enough to handle all the data being produced.

“The intelligence needs to be closer to the end point,” she said. “We also need innovative ways of connecting everything together. So you need security IP, networking IP and storage IP, and then you aggregate that at the rack level and eliminate extraneous components.”

That seems to be the general consensus across the semiconductor industry. “There are two issues here,” said Anand Iyer, director of marketing for the low power platform at Calypto. “One of them is latency. The second is what is the real information being transferred, and how much data is being wasted. And with that you need to do the computing that is necessary, not wasted computing. That’s how you eliminate wasted power.”

Iyer noted that architects at all levels are looking at where they can make changes, and balancing those potential changes against the real costs of those changes. This isn’t a simple decision, because sometimes it may hinge on various memory IP choices, while other times it may hinge on which microarchitecture to use or re-use, what needs to be clock-gated and what doesn’t, and how all of that affects power across the entire system. “The design community needs to understand all the tradeoffs,” he said. “And equally important, knowledge needs to be able to be transferred more effectively about those tradeoffs. Right now there are only very crude ways to do that.”

Other decisions
Improving the über architecture is only one piece of the puzzle, though. It also has to be managed effectively, and IT management is one of the most conservative disciplines in business. The stakes are simply too high to allow rapid changes, yet that kind of change will be essential to realizing the benefits of a massive increase in sensors and data on every front. Ironically, foundries and chipmakers have been able to grasp this issue rather easily because they’ve been wrestling with huge and growing amounts of data at each new process node.

“In 2000 we started feeling pressure to scale the CPU,” said Juan Rey, senior director of engineering at Mentor Graphics. “There were tens of computers, then it became hundreds of CPUs with GPUs to accelerate applications. Then in 2008 to 2009, everyone started slowing the performance per CPU and the industry went to many-core architectures. At 16nm, we expect to see several thousand cores.”

While that works well for some applications, not all applications scale equally, he said. Moreover, he noted that even if companies set up perfectly good private clouds, they don’t always allocate enough resources to properly administer them. No matter how good the architecture, optimization may vary greatly from one company to the next. “Those companies from the manufacturing side have been dealing with this for years, so they’re more likely to address this problem head on and the problem may be very clear to them. But where are other companies going to get these kinds of resources?”

Micro decisions
Dropping down a level, there are other factors to contend with. Computing architectures have always been about solving bottlenecks, and nowhere is this more pressing than in the data center.

“We’ve always seen the drive for more bandwidth, but now we’re seeing a range of newer memory technologies showing up to solve that,” said Loren Shalinsky, strategic development director at Rambus. “We’ve got HBM (high-bandwidth memory) and HMC (Hybrid Memory Cube) and it’s not clear how they will fit into the general processor server. We’re looking at this kind of memory like L4 cache. If you go back to pre-Pentium days, L3 cache was off chip. Then, over time it was integrated into the processor. The next step was DRAM. This new memory is an in-between step. There are aspects that are much lower in power. The power it takes to drive a signal through a PCB and through a connector to memory is a lot more than HBM, which is 1,024 bits wide or HMC, with is in the same range.”

There also are other types of memory hitting the market, such as magnetic RAM (Magnetoresistive RAM) and resistive RAM (RRAM or ReRAM), which fit somewhere between flash and DRAM, Shalinsky said. “This is a continued evolution of an architecture that has not been fundamentally different for the past 30 or 40 years, but every time they come up with a new tier there is an order of magnitude difference. Now they’re squeezing in more middle opportunities. We’re also dealing with larger files. Big database companies are pushing more memory because the database can now sit in memory itself to speed it up.”

HBM is particularly noteworthy because it requires a change in architecture and packaging. HBM requires an interposer to work, and the obvious place for this kind of design to gain traction is at the network and server level where there is more price elasticity. Even Intel Corp., which has been the standard bearer for Moore’s Law since it was first introduced, has included 2.5D and 3D IC in its road map. And as more chips are made using this kind of packaging technology, the price will drop just as it has for other semiconductor technologies.

“As stacked die and 3D-ICs become more affordable, you’re going to see a lot more heterogeneous integration,” said Aveek Sarkar, vice president of product engineering and support at Ansys-Apache. “So you may access a high-speed architecture on one chip, and use a lower speed for something else on a different chip. This isn’t just application-specific anymore.”

Security and IP
One of the big issues that will have to be addressed in all of this is security. It may be hugely more efficient to only send the aberrations in an industrial process to the server instead of a mass of raw data, but the value of that data goes up significantly after it’s been distilled into good information.

“The weakest link of all of this is managing security so that each piece is secure and the security between the pieces is diverse enough,” said Bernard Murphy, chief technology officer at Atrenta. “There’s also a tradeoff between security and power. If the security is local it does not soak up a lot of power. But if it requires a lot of handholding, that requires huge amounts of power.”

Murphy noted that while virtualization was supposed to be highly secure, it hasn’t lived up to its initial promise. “You can do an attack through cache where you flood the cache with an identifiable signature. When the victim comes back to fix the problem, the attacker comes back in and looks at the cache. So the attacker runs as a virtual machine, the victim has his own virtual machine, and both use the same cache.”

A second issue that will have to be addressed is on the IP side, which needs to support all of the changes necessary to accommodate unforeseen changes in architectures. IP companies are generally good at this already, but as the IoT begins gaining steam, those kinds of changes may come far more quickly.

Ron DiGiuseppe, senior strategic marketing manager at Synopsys, said one of these changes will happen at the system level. “One server can support 50 to 80 virtual machines, but with Layer 2 switching and up to 4,000 nodes, you will quickly run out of network address space,” he said. “To scale the network you have to allow the virtual machines to scale. That means we need to create new protocols and the semiconductor guys need to support those protocols.”

DiGiuseppe also noted that with servers, processor performance is not always the bottleneck. In some cases it’s the I/O. “That’s particularly true for Web servers. “They don’t need more performance. They need more I/O. That affects power, and if you have thousands of Web servers, power is much more important.”

And finally, there are some other unknowns and areas for future research that could add new dimensions to this shift. One involves better understanding of data types. Chris Rowen, a Cadence fellow, said it’s unclear at this point whether the total amount of traffic associated with video may be more than just a vertical application. “You can transcode, extract and scale, but do you need to use an architecture that’s more specialized for that?”

A second unknown involves what is known as “deep learning,” which is basically the modern incarnation of artificial intelligence for natural language, object recognition and speech recognition. “As it turns out, the algorithms are very flexible,” said Rowen. “They also work on video and still images. They’re self-training, automatically programmed, and they have unique computational characteristics and numerical convolutions. That provides the potential for security to propagate all the way up to the server world.”