WHITEPAPERS

Outsmarting Silent Data Corruption In AI Processors With Two-Stage Detection

ML-powered outlier detection for semiconductor defects and real-time monitoring for in-field predictive and prescriptive maintenance.

November 12th, 2024 - By: proteanTecs

Silent data corruption is on the rise following advancements in semiconductor technology. The explosion in AI for speech, image, video, and text processing leads to a growing complexity and diversity of hardware systems, bringing an increased risk to data integrity.

SDC rate is much higher than software engineers expect, undermining the hardware reliability they used to take for granted. Recent publications by hyperscalers such as Google and Meta shed some light on the extent of the problem. They report that approximately one in a thousand machines in their fleets is affected by SDC. While they only provide estimates, Alibaba recently published exact statistics revealing 361 Defective Parts Per Million (DPPM) that caused SDCs in their cloud system.

Why should anyone worry about 361 DPPM or even several thousands of them? At a low scale, such numbers are considered normal. However, in fleets that can have millions of servers, SDC occurrences are frequent enough to threaten the integrity of vital services. For example, if a generative AI data system runs on thousands of devices simultaneously, a single error in one of them can lead to many system-level failures.

Undetected manufacturing defects, accelerated aging, or environmental factors can lead to data corruption, during storage, transmission, or processing, and result in unintended changes in information. Traditional approaches to prevent SDC during silicon manufacturing and data center operation fail to provide adequate reliability.

Find out more here, detailing a two-stage detection approach, offering SDC prevention solutions for different stages of a chip’s lifespan.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Chip Industry Week in Review

EDA export controls; Synopsys-Ansys divest requirements; SIA Factbook; McKinsey effects of tariffs; ASE's fan-out bridge; earnings; TSMC's design center; China's legacy chips play; AMD's optical acquisition.

by The SE Staff

RISC-V’s Increasing Influence

Does the world need another CPU architecture when that no longer reflects the typical workload? Perhaps not, but it may need a bridge to get to where it needs to be.

by Brian Bailey

Chip Industry Week in Review

IC, AI global ranking; China's fully automated IC design system; Micron goes bigger; PCIe 7.0 spec; TSMC-Tokyo joint lab; panel-level packaging win; first neuromorphic compute system; GAA forksheets; AMD's new GPUs.

by The SE Staff

Big Changes Ahead For Interposers And Substrates

New materials and processes will help with power distribution and thermal dissipation in advanced packages.

by Gregory Haley

What Exactly Are Chiplets And Heterogeneous Integration?

New technologies drive new terminology, but the early days for those new approaches can be very confusing.

by Bryon Moyer

Outsmarting Silent Data Corruption In AI Processors With Two-Stage Detection

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Outsmarting Silent Data Corruption In AI Processors With Two-Stage Detection

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored