SPONSOR BLOG

Improving Reliability Monitoring Of High-Bandwidth Memory

How deep data analytics enable in-field HBM monitoring and repair.

December 10th, 2019 - By: Eyal Fayneh

As the quest for increased bandwidth and speed continues, multi-die technologies with advanced memory architectures are introduced. As the complexity of these heterogenous packaging continues to develop, new reliability challenges arise.

A new approach to HBM subsystem monitoring and repair provides advanced in-field reliability assurance. By applying analytics to data created by on-chip Agents, proteanTecs’ Proteus provides actionable insights and alerts on the system’s lifetime operation.

What is High-Bandwidth Memory (HBM)?
High-Bandwidth Memory (HBM) is a specialized form of stacked memory architecture that is integrated with processing units to increase speed while reducing latency, power, and size. It presents a premium DRAM offering for high-bandwidth applications such as next-generation supercomputers, graphics systems, and artificial intelligence (AI).

HBM is rapidly evolving to meet the changing needs of the datacenter and networking industries and the technology has already gained significant adoption in the market, expected to grow at a CAGR of 32% by 2022¹. HBM was adopted by JEDEC² as an industry standard in October 2013 and its second generation, HBM2, was accepted in January 2016.

The HBM reliability challenge
Visibility of HBM subsystems is limited by nature due to its 3D integration technology, and signal integrity problems are difficult to debug, validate and monitor. Multi-die HBM packaging introduces new reliability challenges that can lead to functional device failures in-field. These technologies are both inherently complex as well as expensive. Therefore, system failures inflict significant losses on manufacturers and service providers alike.

HBM PHYs do not allow for u-bump redundancies due to the high-density routing, and one u-bump per signal is used for the entire HBM connectivity. A problem in any of the PHY or HBM u-bumps will lead to a chip operational failure. A typical 4xHBM2 includes 13,600 u-bumps for connectivity, creating a reliability challenge and risk of full HBM subsystem failure. At testing, the implications of a failed module incur significant monetary losses for manufacturers. In lifetime (field) operation, a failure in the HBM subsystem may affect the whole system and lead to an abrupt operational failure and unplanned downtime.

Testing of the HBM subsystem is performed using industry standard detection tools that lack parametric sensitivity so marginal lanes are not detected. These may lead to degradation over time, and ultimately failure during lifetime operation. Furthermore, detecting faulty lanes requires activation in test mode. Therefore, degradation over time in mission-mode is not monitored.

In-field reliability monitoring
proteanTecs’ Proteus introduces visibility to HBM, mitigating the inherent limitations and complexities of heterogeneous packaging. The software platform, which applies analytics to data created by on-chip Agents (IPs), custom tailored to represent and automatically cover a specific design. By continuously monitoring signal integrity, Proteus provides actionable insights for reliability monitoring and repair, per pin and in mission mode, to detect degradation trends.

Fig. 1: Proteus for HBM reliability.

Proteus provides a new method of correlating lane degradation to Far-End (FE) and Near-End (NE) insights, which are a function of ASIC and DRAM driver strength, NE and FE micro-bump integrity, Rx sensitivity and interposer.

By alerting on marginal performance of Near-End or Far-End signals, service providers can perform Predictive Maintenance. Proteus identifies potential candidates for faulty-lane replacement and provides the information to the Lane Repair mechanism, which replaces marginal lanes with redundant ones at scheduled maintenance cycles. This enables prevention of system failure due to signal quality degradation beyond margin limits.

Fig. 2: Degradation monitoring and alerts

At system bring-up and characterization, the tool enables virtual probing of the signal amplitude and slew-rate for each pin, serving as an embedded “scope,” without impacting the measured signal. This provides visibility of HBM signal parameters per pin during system characterization and validation, reducing time-to-market, achieving product optimization, and increasing confidence in ramp-up.

As the complexity of heterogeneous packaging continues to develop, Proteus offers a revolutionary approach to HBM in-field monitoring, for unprecedented reliability assurance. Service providers now have the visibility they need to perform predictive maintenance, detecting and repairing faults in systems before they become failures.

Sources:
1. Market Reports World: High-bandwidth Memory Market 2019 Research
2. https://www.jedec.org/standards-documents/docs/jesd235a

Eyal Fayneh

(all posts)
Eyal Fayneh is a co-founder of proteanTecs and the company's Silicon Technologies senior principal engineer. Before founding the company, Fayneh had a 20 year career at Intel, where he served as senior principal engineer in PLLs and clock generation. Fayneh is the inventor of more than 40 patents, and holds a B.Sc.in electrical engineering from the Tel Aviv University.

Knowledge Centers
Entities, people and technologies explored

EUV’s Future Looks Even Brighter

Demand for AI chips is growing exponentially, but costs and complexity limit the technology to a handful of companies. That could soon change.

by Gregory Haley

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Improving Reliability Monitoring Of High-Bandwidth Memory

Eyal Fayneh

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Chip Industry Week in Review

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Improving Reliability Monitoring Of High-Bandwidth Memory

Eyal Fayneh

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Chip Industry Week in Review

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored