Building A Production-Ready Optically Connected Rack For AI Scale-Up


By Nandita Aggarwal and Nicholas Chang As AI models drive compute demand, servers keep getting bigger. Rack‑scale AI systems (such as the 72-GPU systems from NVIDIA or AMD) enable many GPUs to work together through system-level optimization. They push beyond the limits of single-chip performance and meet the soaring compute needs of the AI era. But this is just the beginning. The next s... » read more

SOCAMM2: Bringing LPDDR5X Benefits To AI Servers


The rapid scaling of artificial intelligence is reshaping nearly every dimension of data center design. While much of the focus has been on GPUs, accelerators and advanced packaging, another constraint is emerging as equally critical: power. As AI models grow larger and more complex, power consumption, not raw compute, is increasingly the limiting factor in system scalability. Modern AI work... » read more

The AI Server Challenge: Testing Power At Scale


Artificial intelligence is most often framed as a story of compute advancements. Faster GPUs, denser accelerators, and advanced process nodes. But behind every AI workload, the most fundamental constraint is power. Fig. 1: AI server market. Source: Grand View Research As AI servers scale to meet data center demand, power delivery is becoming one of the most critical and complex engine... » read more

IBM Power Processor, This One Goes to 11


Hot Chips 25 was held August 24-26 on the Stanford University campus again this year, with many exciting and interesting presentations. I’ve noticed an overall trend with more focus being placed on overall systems rather than the socket. As the conference name suggests, there’s a history of showcasing chips, but with the increased emphasis on AI and related large-scale computing, efficiency... » read more

BOLT Optimization Technology Could Bring Obvious Performance Uplift On Arm Server


BOLT is a post-link optimization technology which builds on LLVM framework, which leverages perf tool to collection sampling data and convert the executable into an optimized version. After evaluating BOLT on several workloads such as MySQL, Redis, memcached and nginx on Arm server, we could see obvious performance uplift. This blog post illustrates the methods used to enable BOLT and per... » read more

Closing The Performance Gap Between DRAM And AI Processors


As the workhorse of semiconductor memory, DRAM holds a unique place in the industry thanks to its large storage capacity and ability to feed data and program code to the host processor quickly. Lately, this unsung hero of the circuit board has been taking a backseat to its logic counterparts, as a wave of high-performance FPGAs, CPUs, GPUs, TPUs and custom accelerator ASICs emerges to meet t... » read more

Nginx Performance On AWS Graviton3


In this blog we explore the performance of a Nginx Reverse Proxy (RP) and API Gateway (APIGW) on AWS Graviton3-based instances. We will also refer to these collectively as RP/APIGW. We compared AWS Graviton3-based instances to Intel Xeon 'Ice Lake'-based instances and AWS Graviton2-based instances to demonstrate the leadership performance available with AWS Graviton3. Summary Compared to AWS ... » read more

Performance & Efficiency Cores For Servers


HotChips 2023 was held August 27-29, 2023 at Stanford University in California and was the first in-person version of the conference in 4 years. The conference was held in a hybrid format that had over 500 participants in-person and over 1,000 attending virtually online. Topics covered a broad range of advancements in computing, connectivity, and computer architecture. Both AMD and Intel gav... » read more

Improved Arm Server Price-Performance For HPC


The availability of Amazon EC2 Hpc7g instances with the AWS Graviton3E and Elastic Fabric Adapter (EFA) is opening new opportunities in key areas: Manufacturing Aerospace Automotive engineering Weather prediction The new AWS EC2 instance types have AWS Graviton3E’s 64 Arm Neoverse V1 cores and 8 channels of DDR5 memory. This is alongside the AWS Nitro v5 card with EFA deliver... » read more

Enabling New Server Architectures With The CXL Interconnect


The ever-growing demand for higher performance compute is motivating the exploration of new compute offload architectures for the data center. Artificial intelligence and machine learning (AI/ML) are just one example of the increasingly complex and demanding workloads that are pushing data centers to move away from the classic server computing architecture. These more demanding workloads can be... » read more

← Older posts