Data Centers Need High Reliability Semiconductors


“In the world of designing cars, planes, AI factories … you’ve got to be perfect," said Nvidia CEO Jensen Huang on CNBC last month. "And the reason for this is because there is so much at stake.” Cars and planes need to be extremely reliable because people die if they aren’t. In AI data centers, no one dies when systems fail, but the economic impact is gigantic because Amazon, Goog... » read more

Chip Innovation Will Bridge The Gap For USA Data Center Power


Heading to meetings in Silicon Valley, I often drive through Santa Clara, passing boxy buildings with few windows. They are data centers for local customers willing to pay for low latency. Data centers cluster in Santa Clara because that city's power has been the cheapest in Silicon Valley. The San Jose Mercury News recently reported that two data centers in Santa Clara are empty, waiting for a... » read more

AI Effort And Money Misplaced


While it is early days, and innovation is important, hyperscalers cannot afford to keep throwing money away forever. They need to work out how AI will earn money, and that relies on inference. For some time, I have been intrigued by the amount of money being spent on model development and AI training compared to the investment in inference. Models are an enabler, and every new model is attem... » read more

Complex Mix Of Processors At The Edge


With AI changing so fast, it’s a juggle for companies to ensure they can deliver the best performance now while also future-proofing for unknown AI models or a completely different approach to training and inference that may emerge. There are a slew of options for high-end and budget phones, hyperscalers, and low-cost, low-power edge devices, and while GPUs keep making headlines, many designe... » read more

Chiplets: A Technology, Not A Market


Chiplets are big business, and that business is growing. The total chiplet market today is roughly $40 billion annually. Chiplets account for roughly 15% of TSMC's revenues, and they account for about 25% of all DRAMs. All of the major AI/HPC semiconductor companies (NVIDIA, AMD, Marvell, Broadcom) and the major hyper scalers (Amazon, Google, etc) are looking to chiplets to build superior... » read more

Architecting Chips For High-Performance Computing


The world’s leading hyperscaler cloud data center companies — Amazon, Google, Meta, Microsoft, Oracle, and Akamai — are launching heterogeneous, multi-core architectures specifically for the cloud, and the impact is being felt in high-performance CPU development across the chip industry. It's unlikely that any these chips will ever be sold commercially. They are optimized for specific ... » read more

Strategies For Detecting Sources Of Silent Data Corruption


Engineering teams are wrestling with how to identify the root causes of silent data corruption (SDC) in a timely and cost-effective way, but the solutions are turning out to be broader and more complex than simply fixing a single defect. This is particularly vexing for data center reliability, accessibility and serviceability (RAS) engineering teams, because even the best tools and methodolo... » read more

CXL: The Future Of Memory Interconnect?


Momentum for sharing memory resources between processor cores is growing inside of data centers, where the explosion in data is driving the need to be able to scale memory up and down in a way that roughly mirrors how processors are used today. A year after the CXL Consortium and JEDEC signed a memorandum of understanding (MOU) to formalize collaboration between the two organizations, suppor... » read more

Hunting For Hardware-Related Errors In Data Centers


The semiconductor industry is urgently pursuing design, monitoring, and testing strategies to help identify and eliminate hardware defects that can cause catastrophic errors. Corrupt execution errors, also known as silent data errors, cannot be fully isolated at test — even with system-level testing — because they occur only under specific conditions. To sort out the environmental condit... » read more

The Data Center Journey, From Central Utility To Center Of The Universe


High-performance computing (HPC) has taken on many meanings over the years. The primary goal of HPC is to provide the needed computational power to run a data center – a utilitarian facility dedicated to storing, processing, and distributing data. The beginning of HPC Historically, the data being processed was the output of business operations for a given organization. Transactions, custome... » read more

← Older posts