Knowledge Center
Navigation
Knowledge Center

Servers

popularity

Description

A server is a device connected to a network for computing workloads and storing of data. A server may vary from a single computer to a rack full of server blades made up of memory, processors, storage, interconnecting cables, power supplies, cooling systems, and operating systems.

The idea of a data center started as a place where multiple servers could be co-located and called on demand for computing. As the computing done in data centers has become more intensive, however, it might exceed the capacity of a single server. That has been addressed by allowing multiple servers to be engaged — up to an infinite number, in theory, limited only by the number of accessible servers.

Each server blade has a CPU, memory, and storage or even a GPU. Servers have different uses that are defined by the software and operating system: email servers, database servers, web servers, generative AI servers.

The large hyperscale companies — Google, Facebook, IBM, Microsoft, etc. — used to buy servers from PC manufacturers. Now they build their own and can tweak the architecture to save money.

• Storage
• High-performance CPUs
• Data-processing units (DPUs)
• AI accelerator chips
• Graphics processors units (GPUs)
• Memory
• Security
• Power supplies, such as switched mode power supplies (SMPS) for servers made with SiC or GaN to help keep electronics cooler at higher voltages.

As data centers are interconnected, the number of accessible servers no longer must be restricted to the number in a particular building or campus. As fiber connects different locations together, greater distances no longer will have the latency implications they’ve had in the past. All of this has helped scalability — the ability to scale resources in accordance with the needs of any particular job. With this scalability, however, the next level of inefficiency crops up — the mix of resources to be tapped on a given job may not be fully used. For instance, a given blade may have a fully utilized CPU, with a GPU helping perform some of the work. The GPU is only 30% virtualized, leaving 70% overhead with no return on investment.

Different ideas of disaggregating the server and data center resources are being tried out. Disaggregation in the data center doesn’t mean the same thing to everyone, as there are multiple drivers for departing from the server-as-unit model.

One effort disaggregates the networking from the rest of the server, using dedicated data-processing units (DPUs) onto boards for all of the networking functions, relieving the CPUs of any need to execute code relating to communications so they can concentrate on the actual data workload. “Network disaggregation is now gaining traction in both cloud and enterprise data centers as a way to lower total cost of ownership,” said Eddie Ramirez, vice president of marketing for Arm’s Infrastructure line of business.

The server is basically as unit of computing. In the traditional CPU-centric view of computing, scaling by adding servers has made sense because it scales the computing power. But CPUs don’t operate on their own. At the very least, they need memory, storage, and a way to talk to other internal or external entities. They also may be assisted by alternative computing resources like GPUs or other purpose-built accelerators.

With the current server model, the only way to provide flexibility is to have different server blades with different mixes of resources. But that can become complicated to manage. For example, let’s say that a particular server comes with a four-core CPU and 16 GB of RAM. If a particular job needs the four cores but only half the memory, then 8 GB of memory will sit unused. Alternatively, let’s say that it needs 24 GB of RAM, and no server with that configuration is available. That means one of two things — either storage must be used to hold the less-frequently accessed stuff, slowing performance, or a second blade is needed.

If a second blade is needed, it comes with its own CPU. That CPU will be used either to do nothing but manage access to the extra DRAM, which is a waste of computing power, or the program needs to be partitioned to run across two CPUs. That latter approach is not trivial and, depending on the application, might not even be possible.

Each of these examples shows a mismatch between the needed computing power and the required memory. The same could hold true for other resources, as well.1

In its purest sense, the idea of the disaggregated architecture is to break the server up and pool like resources with like. So CPUs all go into one bucket, memory goes into another bucket, and perhaps GPUs go into yet a different bucket.

 

Fig. 1: A simplified view of multiple server blades, each with a prescribed set of resources. Source: Bryon Moyer/Semiconductor Engineering

Fig. 1: A simplified view of multiple server blades, each with a prescribed set of resources. Source: Bryon Moyer/Semiconductor Engineering

Resources:

  1. Changing Server Architectures In The Data Center
    Sharing resources can significantly improve utilization and lower costs, but it’s not a simple shift.

Multimedia

Reducing Power In Data Centers

Multimedia

Very Short Reach SerDes In Data Centers

Multimedia

New Challenges For Data Centers