How To Build Resilience Into Chips


Disaggregating chips into specialized processors, memories, and architectures is becoming necessary for continued improvements in performance and power, but it's also contributing to unusual and often unpredictable errors in hardware that are extremely difficult to find. The sources of those errors can include anything from timing errors in a particular sequence, to gaps in bonds between chi... » read more

Hunting For Hardware-Related Errors In Data Centers


The semiconductor industry is urgently pursuing design, monitoring, and testing strategies to help identify and eliminate hardware defects that can cause catastrophic errors. Corrupt execution errors, also known as silent data errors, cannot be fully isolated at test — even with system-level testing — because they occur only under specific conditions. To sort out the environmental condit... » read more