Why Chips Fail, And What To Do About It


Experts at the Table: Semiconductor Engineering sat down to discuss reliability of chips in the context of safety- and mission-critical systems, as well as increasing utilization due to an explosion in AI data, with Steve Pateras, vice president of marketing and business development at Synopsys; Noam Brousard, vice president of solutions engineering at proteanTecs; Harry Foster, chief verificat... » read more

Outsmarting Silent Data Corruption In AI Processors With Two-Stage Detection


Silent data corruption is on the rise following advancements in semiconductor technology. The explosion in AI for speech, image, video, and text processing leads to a growing complexity and diversity of hardware systems, bringing an increased risk to data integrity. SDC rate is much higher than software engineers expect, undermining the hardware reliability they used to take for granted. Rec... » read more

Redefining RAS in Datacenters with Real-Time Health Monitoring


Abstract Hyperscale datacenters require intense computational power for compute-intensive tasks, such as AI, data analytics, machine learning, and big data processing. They leverage parallel processing across multiple computers, in high-density servers, to handle complex tasks efficiently. This uses specialized, powerful processors and training and inference of specific GPUs or ASICs. Such c... » read more

Droop And Silent Data Corruption


By Aakash Jani and Lee Vick Let me set the scene. You are a child psychologist (played by, let’s say, Bruce Willis for illustrative purposes), and you are sitting next to a frightened kid. He turns to you and whispers, “I see dead bits.” Okay, I grant you that’s not exactly the quote, but data center operators are seeing transient errors at an alarming rate, and at scale. These error... » read more

Heat-Related Issues Impact Reliability In Advanced IC Designs


Heat is becoming a much bigger problem in advanced-node chips and packages, causing critical electrons to leak out of DRAM, timing and reliability issues in 3D-ICs, and accelerated aging that are unique to different workloads. All types of circuitry are vulnerable to thermal effects. It can slow the movement of the electrons through wires, cause electromigration that shortens the lifespan of... » read more

Functional Compaction for Functional Test Sequences (Purdue University, I. Pomeranz)


A new technical paper titled "Functional Compaction for Functional Test Sequences" was published by IEEE Fellow Irith Pomeranz at Purdue University. Abstract: "The occurrence of silent data corruption because of hardware defects in large scale data centers points to the advantages of applying functional test sequences to detect hardware defects that escape scan-based tests. When using funct... » read more

What’s Missing In Test


Experts at the Table: Semiconductor Engineering sat down to discuss how functional test content is brought up at first silicon, and the balance between ATE and system-level testing, with Klaus-Dieter Hilliges, V93000 platform extension manager at Advantest Europe; Robert Cavagnaro, fellow in the Design Engineering Group at Intel (responsible for manufacturing and test strategy of data center... » read more

Silent Data Corruption Considerations For Advanced Node Designs


Ensuring reliability, availability, and serviceability (RAS) has long been an important consideration for many types of electronic systems, with major implications for chip design. Clearly, military hardware must be very reliable, and servers and automotive systems are also expected to be available constantly. Some amount of failure is inevitable, so being able to repair, avoid, or mitigate fau... » read more

Strategies For Detecting Sources Of Silent Data Corruption


Engineering teams are wrestling with how to identify the root causes of silent data corruption (SDC) in a timely and cost-effective way, but the solutions are turning out to be broader and more complex than simply fixing a single defect. This is particularly vexing for data center reliability, accessibility and serviceability (RAS) engineering teams, because even the best tools and methodolo... » read more

Memory’s Future Hinges On Reliability


Experts at the Table: Semiconductor Engineering sat down to talk about the impact of power and heat on off-chip memory, and what can be done to optimize performance, with Frank Ferro, group director, product management at Cadence; Steven Woo, fellow and distinguished inventor at Rambus; Jongsin Yun, memory technologist at Siemens EDA; Randy White, memory solutions program manager at Keysight; a... » read more

← Older posts