Ensuring AI Reliability: Mitigating OCP’s Silent Data Corruption Risks


Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand ext... » read more

Resilient And Optimized GenAI Systems


AI and data center systems are being pushed to their limits, with soaring complexity, nonstop inference workloads, and rising energy demands. Addressing these pressures requires more than incremental improvements, it calls for collaboration across the ecosystem. That’s why proteanTecs has joined forces with Arm, bringing our real-time monitoring technology into Arm’s Neoverse Compute Subsys... » read more

Critical Optimization Factors For GenAI Chipmakers


Today’s GenAI arms race is fought with novel chip architectures and packaging. Specialized hardware designs are proliferating in the form of GPUs, TPUs, NPUs, and more, all tuned for parallelism and matrix-heavy AI math. In this hyper-competitive landscape, chip vendors scramble to differentiate their products on multiple fronts. They promise some mix of better performance, efficiency, or ... » read more

Same Chip, Two Destinies: How Power Profiles Improve With On-Chip Monitoring


What happens to critical power-related considerations when the same chip is handled two different ways, with or without visibility from within? This article begins by examining how the absence of on-chip monitoring impacts peak power, average power, and Di/Dt noise (rate of current change), as illustrated in the diagram below and the subsequent discussion. It then details how these aspects c... » read more

Margin Sensors In The Wild


Back in March, I wrote up an article here that looked at how a proxy circuit could be used to measure variations in circuit performance as conditions changed in the operating environment. There were a couple of recent presentations on margin sensors at two of the big EDA vendors' customer engineering forums that we’ll look at as well as another product with an upcoming presentation at DAC. Ma... » read more

Application-Specific Power Performance Optimizer Based On Chip Telemetry


As datacenter power consumption continues to pose cooling and cost challenges, and battery driven devices are expected to last longer between charges, the search for advanced power management mechanisms continues. A modern design must balance between maximizing performance, consuming the least amount of power, and guaranteeing no failures in field. The latter requires safety margins which tr... » read more

Mission Profile Analytics For The Automotive Industry


The automotive industry is undergoing a major transformation with the rise of electrification, connectivity, and autonomous driving capabilities fueling the need for a greater number of more advanced semiconductors. The associated regulatory expectations are also creating challenging safety and reliability requirements for automotive-grade silicon that need to be understood and managed over a w... » read more

proteanTecs On-Chip Monitoring And Deep Data Analytics System


High reliability applications in service-critical markets, such as autonomous driving and cloud computing, demand maximum performance and minimal power and cost. Reducing design margins while maintaining high reliability becomes imperative. State-of-the-art silicon processes offer mainly logic density improvements at limited speedup. Worst-case design analysis is not cost effective anymore. ... » read more

Chiplet Planning Kicks Into High Gear


Chiplets are beginning to impact chip design, even though they are not yet mainstream and no commercial marketplace exists for this kind of hardened IP. There are ongoing discussions about silicon lifecycle management, the best way to characterize and connect these devices, and how to deal with such issues as uneven aging and thermal mismatch. In addition, a big effort is underway to improve... » read more

Chip Monitoring For Max Performance And Security


In a semiconductor market dominated by SoCs for high-performance computing, AI, automotive and 5G, semiconductor companies face myriad challenges and device requirements. The specific challenges for any given SoC vary but can include issues around performance debug and security against hacking. Top of the list includes the need to ensure quality, enhance safety, optimize performance, and increa... » read more

← Older posts