Sweeping Changes Ahead For Systems Design

Demand for faster processing with increasingly diverse applications is prompting very different compute models.

popularity

Data centers are undergoing a fundamental change, shifting from standard processing models to more data-centric approaches based upon customized hardware, less movement of data, and more pooling of resources.

Driven by a flood of web searches, Bitcoin mining, video streaming, data centers are in a race to provide the most efficient and fastest processing possible. But because there are so many different types of data, disparities in lifetimes among components, and so much change in software, the magnitude, speed, and breadth of these changes is unprecedented.

“The big shift now is that people no longer think at the individual server level, especially the cloud providers. They think about racks and whole data centers,” said Steven Woo, fellow and distinguished inventor at Rambus. “This means the thinking around data center architecture has to cover the wide range of people who are using a data center at any point in time. There are people who want small jobs, there are people who want big jobs, and there are people who want enormous jobs. Some of those exceed the data capacity of an individual server. Architects must consider all of those job sizes and decide how to start to provide especially the memory capability that some of these enormous jobs are going to need in the future.”

Things have changed dramatically since 2012 when Cisco conducted a market study, which assumed a lot of data traffic was going to be driven primarily by video consumption. While video still represents a large portion of the data, a lot of what’s being processed and stored in data centers involves other types of data, as well as the mining and analytics from that data.

“Last year when the pandemic hit, the amount of time it took to analyze and come up with a vaccine was in the order of months, and that was a huge data analysis exercise,” said Arif Khan, director of product marketing for design IP at Cadence. “That’s the power of data and AI, put together. And it’s not that they did trials by trying all these molecules out. It was actually a data-centric analysis. Like genomics, bioscience these days is computational. Fluid dynamics is number crunching. These AI and ML applications are happening in the cloud, and this is where all the number crunching happens. This is our data-centric world.”

The infrastructure relies on data that’s been captured, analyzed, and processed in the cloud. But all of these machines rely on data being available, so it no longer can be sitting on traditional hard disks. As the systems operate, the data must be fairly close to the systems that are accessing it, so there are newer memory hierarchies and technologies emerging to bring memory closer to the processing units.

Fig. 1: Global data growth projection. Source: IDC 2018/Cadence

Fig. 1: Global data growth projection. Source: IDC 2018/Cadence

Disaggregation
One approach that is gaining traction is the notion disaggregated memory and compute infrastructures, where CPUs, memory, and storage are pooled.

“The data center is moving from a model where each server has dedicated processing and memory, as well as other resources like networking and accelerators, to a disaggregated model that employs pools of shared resources that can be efficiently composed to the needs of specific workloads, whatever their requirements,” said Anika Malhotra, product marketing manager for verification domain IP at Synopsys. “Disaggregation and composability tailor computing resources to the workload, bringing many benefits, including higher performance, greater efficiency, and reduced total cost of ownership (TCO) for the data center.”

Others agree. “Disaggregation will be a key element for the next era of data center computing architectures as it will be critical in ensuring the performance, efficiency, and TCO benefits at scale,” said Kshitij Sudan, senior manager for the Go To Market, Automotive and IoT line of business at Arm.

Compute Express Link (CXL) is seen as a key component of these next-generation disaggregated server architectures. CXL memory expansion and pooling chips are essential for both traditional and disaggregated architectures.

“How do we compose the right resources for large and small workloads? CXL is a critical enabler of this evolution to provide additional memory beyond direct attached DRAM to unlock a pooling architecture in memory attached memory,” said Matt Jones, general manager, interface IP at Rambus. “With the insatiable need of memory, as you get into pooling there’s also the notion of efficiency and the matching of resources between compute and memory. This gives the ability to share, and make best use of that memory, not only from a per workload basis, but also to capitalize on the lifecycle of each of those elements.”

Fig. 2: CXL memory pooling. Source: Rambus

Fig. 2: CXL memory pooling. Source: Rambus

Big changes ahead
The rapid increase in number of connected devices, whether those are wearables or automobiles or security cameras, also is creating new opportunities to extract value from the data.

“I might not know what question I need to ask today, but maybe next week I’m going to realize, ‘Wow, it would be really beneficial and valuable to me to understand X,” said Scott Durrant, DesignWare IP solutions marketing manager at Synopsys. “I’ve got this large data set that I’ve been collecting. I can analyze that, and find an answer to X, and gain value from that.’ So people are looking at data as kind of a core value, as opposed to buying an application to accomplish a specific purpose. They build the application and then say, ‘I need a specific data architecture or data model to support what I’m doing with this one application.’ Now, a data centric-architecture says, ‘Let’s build a data model that we can use to classify and describe all of this data that we’re collecting. Then we can utilize various applications to examine and analyze that data.’ How does that impact the SoCs that are being developed for and utilized in data centers today and going forward?”

Giant repositories of data need to be analyzed and shared among various applications, and the SoC has to be able to efficiently load data, analyze it, and write the results back to the data store. “This means the interfaces on these SoCs now need to be able to handle these very large amounts of data in an efficient way,” Durrant said. “Efficiency can mean a few different things, including very low latency and low power. This is where protocols like CXL come into play, along with high speed Ethernet and PCI Express, and all these interfaces that we use to move data around within devices, between devices, and between racks in a data center.”

And because DRAM potentially lasts longer than a CPU, having those devices be able to be purchased and used for their full lifecycle starts to bring additional efficiency into the equation.

“We envision a memory pooling device that builds upon the expansion device would support multiple hosts through the important building blocks of the PHY and the controller, and allow multiple hosts to share a single set of DRAM resources,” Rambus’ Jones said. “In the case of DRAM, numerous hosts can be matched with the right amount of memory through the CXL mesh architecture. This opens up the ability to interconnect the right number of pooled CPU resources, with the right amount of memory for your workload, retire it, then re-use it as provisioning allows within the network, which drives new software models and new usage. This type of hardware we see unlocking some of that evolution in the data center.”

In the past, system performance essentially was limited by the microprocessor performance, said Tony Chen, product marketing director for Design IP at Cadence. “Today, it’s more that the bottlenecks are actually in the movement of data traffic, whether it’s from the memory or the processor communication between the various plug-in accelerators, and we’re seeing users run into bottlenecks with the movement of data. So they are asking for lower latency and faster speed. Even with the introduction of GDDR and HBM, where the speeds are going up every two years, users want another 2X increase in speed every two or three years. There’s also a cost perspective of this shift. Users are always trying to reduce the costs in their data center, so they are trying to share memory, trying to pull the memory together. How the user makes the architectural tradeoffs is actually a very complicated issue, because there are many things to consider. It’s a balance between cost and performance.”

It also means design and verification are quite challenging to get right for data center architectures, especially with so many systems companies developing their own chips. These companies are pushing the boundaries with their own custom processors, using non-x86 ISAs, and this is impacting what everybody else is doing, because they’re seeing all of this activity.

“These are the kinds of companies that are providing and storing incredible amounts of data,” said Aleksandar Mijatovic, design manager at Vtool. “They were driven by the need to have the concentrated data center as the center of all truth, to put aside the old ways of building architectures, and structure it to make it more organized based on that data being something that is unchangeable, that all calculations are temporary. This is contrary to all processing systems, where data comes and goes but processing stays the same. At the core of this is the split of data storage, data processing and data transport. Those three layers all can be viewed separately, and depending on where you put your emphasis, you will get different types of architectures. They grew so accustomed to that way of processing that they forgot that other parts also can be a focus. Right now, as the amount of data is increasing, and processing along with everything else is being less constant, we are coming to data-centricity.”

This requires totally new designs. “Each application that will have access to the actual storage has to have its own resources, at least from its point of view, but we have to manage them noted Momcilo Kovrlija, verification engineer at Vtool. “And those resources cannot be flat for each application, even though applications should have the feeling that all the resources are only for its application. We have to create, for example, a storage layer, distribution and managing layer, and also the application layer where each application will operate. This is akin to a helio-centric system, in which the sun will be the data, and all other planets will be the actual applications. A model of this must be created. It may not be as complicated as it would be in the real system, but to some extent it needs to be modeled to check if everything works correctly from the point of view of the whole architecture — not just if we can send and receive data, but also whether the management of resources is done the proper way.”

Given that one of the goals of these emerging technologies for the data center is to enable these various applications, how to pick the right mix of storage and compute and networking to make these on-demand composable, and make these able to be mixed and matched, and exchange information with each other is no small feat.

Gary Ruggles, senior product marketing manager at Synopsys, noted this is one of the reasons CXL comes up so often in this conversation. “CXL offers the promise of just letting all the memory in the whole system be shared amongst anybody who wants it. The idea is you’ve got all these CPUs and host systems that have memory. You’ve got all these memory add-in cards that have memory. You’ve got accelerators that have memory so that they can work on problems, even smart NIC cards. All of these things have memory, and you’ve got a tremendous amount of redundancy and waste when you’ve got a lot of it sitting idle. CXL allows the system resources to be shared and utilized.”

The bigger picture
One of the biggest changes is that data centers are no longer just a collection of computers.

“In the past you just had a server with processors, and then a virtual machine, etc.,” said Simon Davidmann, CEO of Imperas. “Data centers are very different now. In the public ones, you can get different processor types, whether it is x86, Arm or otherwise. Or if you want to do simulation or emulation on FPGAs, you can do that in the cloud. People are also using it for some FinTech applications for doing algorithm analysis in FPGAs rather than running on an instruction processor. In private data centers, some of the world’s leading companies — Amazon, Apple, Facebook, Google, Microsoft — have been building their own chips for some time to do their own type of computing, whether that is inference engines for their own businesses, such as computation for recognition of pictures and images or otherwise. This is akin to what people have been talking about for years, having highly parallel processing capabilities with lots of cores. And because they tend to be almost application-specific, such as an inference engine for these types of machine learning applications, they can design the architecture for that.”

So while the software is available for AI/ML, the computation may be too slow. “As a result, people are trying to share multiple processors, building these chips/processing elements/designs that are highly parallel and highly focused on machine learning and AI inferencing,” Davidmann said. “They’ve got software that runs too slowly, so they can target the hardware. But they’re doing it not as just a proprietary machine. They’re doing it so they can run it in their data centers to make use of it in scale. The data center is changing to provide high-performance solutions for different application areas like machine learning, and the agents that that they’re putting in the data center having architectures for those application areas, and those architectures tend to include the network on chip for the communication between the multiple cores, the multiple cores, and then the accelerators on them like the vector engines.”

This activity just underscores the rise of x86 alternatives, from Arm to RISC-V, to any number of proprietary new instruction set architectures that are now being developed.

“We will see RISC-V in the HPC or server domains in a couple of years,” said Zdenek Prikryl, CTO of Codasip. “We are not there yet, but RISC-V will be there soon. It may be a better option for AI or machine learning because this is still a moving target, and for many people it means many things. RISC-V can be really important there because it allows these innovations, and changes according to the algorithms that people need to have.”

Prikryl said this shift is already beginning for RISC-V. “It’s going to traverse to the high end. Right now we can see that it’s mature for embedded. We can see that other vendors have CPUs there, they are going well, and we will see the shift in the couple of years to the high end. We will see the CPUs that are able to run Android, and actually we had some of them in the prototypes already, and we will see the adoptions later on. We will also see growth in the ecosystem, because the ecosystem is important for RISC-V.”

Others echo that point. “It’s not about the hardware instruction set architecture,” said Kurt Shuler, vice president of marketing at Arteris IP. “It’s about the software for the hardware instruction set architecture. This is why if you look at the x86 ISA, one of the reasons it’s still around today is because so much software has been created for it. With Arm, and others, you’ve had to create this stuff in it. Once it reaches a certain inflection point, things go nonlinear. I don’t know that we’re there just yet, but things are getting there. A shift is happening, and what this means, big picture, is that in terms of integration of companion chips to these x86 chips, generally they’re not in the same package because it’s an Intel chip product oftentimes going into an Intel or AMD motherboard. Then there is a PCIe card with the accelerator, so you’re kind of limited by what PCIe can give you. When things go to the Arm side, these guys aren’t just saying they’re going to have a whole bunch of Arm cores and it’s just going to be CPUs. They say, ‘We’re doing this because there are custom functions that we want to do — maybe search algorithms or otherwise that we want to do more efficiently than we can with the x86 server with a whole bunch of PCIe cards.’ They’re innovating, but they’re innovating in the chip.”

Shuler said what’s interesting here is that with the RISC-V ISA, hardware acceleration basically can be built into the CPU instruction set architecture. “You do this opcode, send some data to it, and then something happens outside the CPU. And that could be something like in-memory computing. Instead of playing with something within the CPU stack, it plays with data somewhere else. You have the flexibility to do with the RISC-V architecture, or if you have an Arm architecture license. And not only can you do that, you could have your own chip with a whole bunch of memory attached to it. So you can have your RISC-V or Arm cores, and you’ve got your hardware accelerator and a bunch of memory around it, and you’re basically doing in-memory computing. At the high level, basically all you’re doing is memory accesses, but at the low level there’s actually computation going on under the hood. And this works really well for some of the neural net activity. You’re still in this von Neumann architecture, where the CPU is the brain, or the CPU is at least scheduling everything. It’s a renaissance in SoC design.”

How far this will go with RISC-V is anyone’s guess at the moment. “Anytime you start doing things to the instruction set architecture you have to have compilers and debuggers and software stacks and tools. All it would take is for a company like one of the superscalars to say they will invest $100 million or $200 million into a software tool chain for whichever ISA. That’s the blocking and tackling that needs to be done to make this more ubiquitous. Right now people are playing with it. Modifying an instruction set architecture to do custom instructions that are actually a whole bunch of hardware, outside of that CPU complex —that’s something that will become more prevalent as things get more competitive between these guys because it’s actually a pretty small cost for somebody who’s dealing with these huge data centers, and this many searches or this many purchases,” Shuler noted.

Conclusion
The industry is at a point where there is a huge amount of computation that needs to be done, and hardware, architecture, and silicon are all coming together to provide good solutions for it.

“Often you get people complaining, ‘What do we do with Moore’s Law? We’ve got more silicon than we know what to do with,’ said Imperas’ Davidmann. “‘How do we get the clock frequency so fast because we’ve got these applications which are serial or have to go faster?’ But actually, we’ve got processor architectures, networks on chip, and accelerators that are all coming to work well for these AI things. Some people are doing it with generic stuff like RISC-V or Arm, but some people are building their own everything.”

Today’s data centers, whether they’re commercial or private, are adding much more compute power, with better technology to do that computing more quickly and more efficiently.

“It’s a very interesting time where the data centers are getting more and more use, but there are areas of those data centers which are very specialized to specific usage,” Davidmann said. “One part of that is these AI inferencing engines, which from an EDA point of view is great because they’re having to build these new types of chips. Anybody who really wants to get ahead really needs some sophisticated, custom silicon in their data center to do a good job. Whether they buy a whole card or they build their own, they need much more compute than is available with standard processors. That’s the key. There’s more software needed to run the applications than can run on standard processor architectures and chip architectures, so they’re having to go to new architectures and multiple new accelerators like vectors, and new SoCs with NoCs in them.”

Related
Improving Power & Performance Beyond Scaling
How to reduce total cost of ownership in data centers.
Usage Models Driving Data Center Architecture Changes
The way the world is accessing data is changing. The data center is evolving to adapt.
Challenges For New AI Processor Architectures
Getting an AI seat in the data center is attracting a lot of investment, but there are huge headwinds.
RISC-V Targets Data Centers
Open-source architecture is gaining some traction in more complex designs as ecosystem matures.



Leave a Reply


(Note: This name will be displayed publicly)