Ensuring AI Reliability: Mitigating Silent Data Corruption Risks


Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand ext... » read more

The Demise Of Static Timing Verification?


The chip industry traditionally has relied on margins to help them mitigate timing problems, but an increasing array of factors are now influencing timing. Can static timing analysis evolve to address these problems? Static timing verification (STA) was a cornerstone technology for the acceptance of the register transfer level (RTL) abstraction. It showed that functionality would not be impa... » read more

From Reaction To Prevention In Data Center RAS


The rise of artificial intelligence (AI), cloud services, and IoT has fueled the rapid expansion of hyperscale data centers. These massive facilities house thousands of servers, all working to support an increasingly digital world. But as the scale of data centers grows, so too does the need for reliable and high-performance semiconductors. Semiconductor failures and inconsistencies can cause s... » read more

Improving DRAM Performance, Security, and Reliability by Understanding and Exploiting DRAM Timing Parameter Margins


Abstract: "Characterization of real DRAM devices has enabled findings in DRAM device properties, which has led to proposals that significantly improve overall system performance by reducing DRAM access latency and power consumption. In addition to improving system performance, a deeper understanding of DRAM technology via characterization can also improve device reliability and security. The... » read more