The Four Pillars Of Hyperscale Computing

Data center design has changed in key ways over the past decade, driving a reshuffling of the value chain.

popularity

In his keynote at CadenceLIVE Americas 2020, Facebook’s Vijay Rao, director, Technology and Strategy, described the four core elements the team considers when designing their data centers—compute, storage, memory, and networking. Wait a minute. Facebook? How did we get here? Wasn’t EDA supposed to be focused on chip design? As indicated in a previous blog, electronic value chains are definitely undergoing major changes, and EDA and computational software are key enablers.

Facebook’s keynote was quite fascinating, outlining in quite some detail the custom silicon design considerations for inference acceleration needed to meet their custom compute requirements. As the former product management lead for all aspects of emulation, FPGA-based and virtual prototyping at Cadence, my heart always beats faster when customers show pictures of their emulation and prototyping environments that include Cadence tools! So how do data centers look like these days? What happened and when?

The figure below illustrates some of the changes that occurred in the last two decades. It’s inspired by a set of four slides that Mellanox Technologies’ Kevin Deierling presented at the Linley Spring Processor Conference in April 2020.


Data Center Evolution 2008 to 2020

On the software side, programmability has changed the picture fundamentally. In about 2008, virtual machines would compute user workloads on CPU-centric architectures that were connected as networks within the data center with 10GB/s speeds. Roughly in 2013, software-defined networking found its way into the data center and network speeds improved to about to 40GB/s. Containerization replaced the classic virtual machine model at that time as well.

Today, storage has become software-defined too, with smart storage in the hardware and network speeds increasing to up to 100GB/s. The key change that happened in the last couple of years—as outlined by Hennessy/Patterson in their Turing lecture—is that we entered the era of domain-specific compute architectures (DSAs) and domain-specific languages to program them. The requirements for hyperscale data centers have become so specific that in recent years we had headlines like “Cisco Enters Chip Market, Supplying Microsoft, Facebook” and “Facebook Plans To Develop Its Own AI Chips”, and now we had the CadenceLIVE 2020 keynote, which detailed specific requirements that drove their development priorities.

Value chains are being reshuffled. Industry structures are changing.

The second customer keynote at CadenceLIVE came from Annapurna’s Co-Founder, Nafea Bshara, who is now a VP/Distinguished Engineer at AWS. The title of the keynote said it all: “How the Cloud and Industry Collaboration help Bend the Curve for Chip Development.” Nafea described Amazon’s custom infrastructure with compute and storage servers, routers, load balancers, and—you guessed it —custom silicon, known within the AWS cloud as Graviton 2. The different levels of workloads that were presented were quite interesting, and, of course, EDA tools like Xcelium and Liberate—some of those using the most cycles—are already available in the cloud on AWS Graviton Arm-based instances as well.

Where is all of this going? Networks will have to become faster, storage latencies will have to go down and storage volumes will have to go up. Compute-domain specificity will increase even more. The Next Platform’s Timothy Prickett Morgan’s discussion with NVIDIA’s Jensen Huang nicely illustrates how the transformation of the data center goes well beyond just changes in the semiconductor industry design chain and the dynamics in the processor ecosystems and Tier 1 hyperscale companies who are doing their own chip design. The data center will become fully programmable. Design for power efficiency, thermal optimization and integrating multiple chiplets using 3D-IC technologies will be key enabling technologies, worthy of their own future blog.

It’s an exciting time to be in the electronics development ecosystem, enabling advanced compute!

P.S: Here are the links resources referenced in the figure:



Leave a Reply


(Note: This name will be displayed publicly)