Speeding Down Memory Lane With Custom HBM

Integrating the functionality of the HBM base die into a logic die provides greater flexibility and additional control.

popularity

With the goal of increasing system performance per watt, the semiconductor industry is always seeking innovative solutions that go beyond the usual approaches of increasing memory capacity and data rates. Over the last decade, the High Bandwidth Memory (HBM) protocol has proven to be a popular choice for data center and high-performance computing (HPC) applications. Even more benefit can be realized as the industry moves toward custom HBM (cHBM), providing system on chip (SoC) designers with the flexibility and control to achieve greater performance or lower power and smaller area depending on their application.

Why HBM is winning

HBM is increasingly used in data centers for AI/ML and compute intensive workloads in demanding applications. Support from all three major vendors means that end customers can have true multi sourcing, although accelerated demand has put pressure on the supply chain. According to a recent Bloomberg Intelligence report, the HBM market is set to grow at an annual rate of 42% from US $4B (2023) to US $130B (2033), driven mainly by AI computing as workloads expand. HBM will occupy more than half of the total DRAM market by 2033.

HBM offers significantly higher memory bandwidth, lower latency, and a compact form factor (using 3D vertical stacking to increase density and shorten data paths). Since the original HBM in 2013, JEDEC has had an aggressive roadmap. As noted in a recent blog post, the current HBM4 doubles channel count per stack with a 2Kbit interface, speed bins up to 6.4Gbps, and options supporting 16-high through-silicon via (TSV)  stacks. Figure 1 shows a typical system on chip (SoC) design with a processor die and an HBM stack.

Fig. 1: Traditional 2.5D HBM-based SoC with silicon interposer.

The connection between the processor die and the memory die stack is made through a silicon interposer, using the physical (PHY) layer as defined by the HBM standard. The use of an interposer means that, while the HBM stack is fully 3D, the entire SoC is only 2.5D, reducing potential space savings. The propagation time through the interposer slows down memory access time, reducing potential performance. The entire HBM memory stack, including the base die, is provided today by the DRAM vendor. This can limit design flexibility and is the motivation for cHBM.

Benefits of customization

The key to custom HBM is integrating the functionality of the base die into a logic die designed by the SoC team. This includes controlling the I/O interfaces, managing the DRAM stack, and hosting Direct Access (DA) ports for diagnostics and maintenance. Integration requires close cooperation with the DRAM vendor, but it gives SoC designers greater flexibility and additional control over how the HBM core die stack is accessed. They can now integrate the memory and processor dies tightly and optimize for power, performance, and area (PPA) depending on the application.

The SoC designers have the freedom to configure and instantiate their HBM memory controller to directly interface with the HBM DRAM stacks using a DFI2TSV bridge. The logic die can incorporate augmented functionality such as a programmable high quality BIST controller, a D2D adapter, and a high-speed interface, such as Universal Chiplet Interconnect Express (UCIe), to then communicate with the processor die in a full 3D stack. Existing designs can be reused since the die is manufactured in the logic process, not the DRAM process. Figure 2 contrasts the HBM and cHBM approaches.

Fig. 2: A look at the HBM DRAM stack.

cHBM has the advantage of significantly reducing the interposer induced delays on the datapath and the associated power and performance penalties. It effectively brings the memory and processor dies closer together by reusing any existing direct die-to-die high speed interfaces such as UCIe. The resulting flexibility can be leveraged in different types of scenarios:

  • Use by cloud providers for edge AI applications where cost and power consumption are key criteria
  • Use in complex AI/ML based compute farms where capacity and throughput are pushed to the limit

cHBM challenges

The whole idea of cHBM is still very new, and the technology is still emerging. As with any innovation, there will be challenges ahead. Integrating base die functionality into a logic die means that end users must consider the complete lifecycle—design, ramp, production ramp, and in-field aspects—from a Silicon Lifecycle Management (SLM) perspective. For example, the burden of screening for DRAM cell defects after HBM die stacking at the wafer level now falls to the end user. This raises questions such as:

  • How will the user handle any vendor specific custom DRAM algorithms that may be recommended?
  • Can the user perform comprehensive in-field HBM test and diagnostics during scheduled down time?

Successful cHBM deployment will require a fully supported ecosystem bringing together IP providers, DRAM vendors, SoC designers, and ATE companies. For example, conventional ATE cannot be used to test cHBM due to both the number and density of the interconnects. Nevertheless, the additional degree of flexibility that cHBM promises to provide has clearly attracted industry attention, as shown by a recent announcement from Marvell in conjunction with the three major DRAM vendors.

Future

Choosing the right partner will be critical for success with cHBM. Synopsys SLM solutions have been successfully deployed for HBM subsystems by partners such as Socionext. An example of using the Synopsys SLM ext-RAM and SHS solutions is available. Several enhancements and improvements to support cHBM are on the Synopsys roadmap. The company is collaborating with DRAM vendors, SoC providers, ATE companies, and end users to accelerate cHBM adoption and usage. Stay posted for more announcements throughout 2025.



Leave a Reply


(Note: This name will be displayed publicly)