What’s Next For High Bandwidth Memory

Different approaches for breaking down the memory wall.


A surge in data is driving the need for new IC package types with more and faster memory in high-end systems.

But there are a multitude of challenges on the memory, packaging and other fronts. In systems, for example, data moves back and forth between the processor and DRAM, which is the main memory for most chips. But at times this exchange causes latency and power consumption, sometimes referred to as the memory wall.

To address these issues in high-end systems like servers, OEMs can use faster DRAMs. Another solution is to stack and integrate the logic and memory in an advanced package. The idea is to bring logic and memory closer together to speed up the process and break down the memory wall.

At the high end, there are several packaging options on the table. In many cases, these packages may incorporate a logic die along with a technology called high bandwidth memory (HBM). A 3D device that resembles a small cube, HBM stacks DRAM dies on top each other to boost the memory bandwidth in systems. So far, HBM has carved out a niche in high-end systems, although the technology is gaining steam amid the push toward data-intensive workloads like gaming, machine learning and other applications.

“HBM provides higher bandwidth and better power efficiency,” said Jan Vardaman, president of TechSearch International. “HBM is said to provide a better picojoule per bit than other memory options for AI accelerators.”

Sold in the market for some time, HBM is moving toward a new and faster specification. Still, HBM, along with the various package types, will remain expensive. And driving down the costs will remain difficult amid several manufacturing challenges. HBM isn’t used in PCs and smartphones, because its large and expensive.

Today’s HBM, however, may not be nearly enough to keep up with future bandwidth requirements. So, vendors are working on new technologies. Among them:

  • Vendors are sampling a new version of HBM based on the HBM2E specification. The next version, called HBM3, is in R&D.
  • HBM is typically found in a high-end package type called 2.5D, but there are other packaging options as well, including fan-out and bridges.
  • In R&D, the industry is working on advanced HBMs using a new bonding process. Vendors are also developing new 3D DRAM technologies, namely 3DS (three die stacked) DRAMs.

Fig. 1: Future HBMs and 3D DRAMs with hybrid bonding. Source: Xperi

More data
The amount of data is exploding in the network. Worldwide Internet protocol (IP) traffic, or the flow of data in the Internet, is projected to reach 4.8 zettabytes (ZB) per year by 2022, up from 1.5ZB per year in 2017, according to Cisco. A ZB is 1 trillion gigabytes.

Data is generated by PCs, smartphones and other systems, and is then transported and processed by servers in data centers. Many organizations have their own data centers. In addition, Amazon, Google, Microsoft and others operate hyperscale data centers, which are larger facilities that provide third-party cloud services.

Hyperscale data centers are facilities with at least 10,000 square feet and a minimum of 5,000 servers, according to IDC. The total number of hyperscale data centers will grow from 338 in 2016 to 628 by 2021, according to Cisco. These centers will account for 55% of IP traffic by 2021, up from 39% today, Cisco says.

The surge in traffic is fueling the demand for faster servers with more memory, but there’s one problem. “Memory bandwidth is a critical bottleneck for next-generation platforms,” said Manish Deo, senior product marketing manager at Intel, in a recent white paper.

Simply put, DRAM is unable to keep up with the bandwidth requirements in systems. But DRAM vendors are taking steps to solve these issues by moving toward a new data transfer spec.

DRAMs incorporate the DDR4 interface standard. Double-data-rate (DDR) technology transfers data twice per clock cycle. DDR4 operates at up to 3,200Mbps. Now, DRAM vendors are ramping up devices based on the new DDR5 standard. DDR5 supports up to 6,400Mbps.

There are other changes. The industry is looking for ways to bring the memory and processing functions closer together in systems.

For years, OEMs generally placed separate components, such as processors and memory, on the board in systems. But for servers, placing discrete chips on a board takes up too much space and is inefficient in moving data from one device to another.

Some 90% of energy consumed by memory is used to transfer data, according to Applied Materials. “Moving memory closer to compute can alleviate this,” said Sean SK Kang, a director in the Semiconductor Products Group at Applied Materials, in a recent blog. “Multiple strategies are being worked on to increase the power and performance efficiency of memory and compute, including memory optimized for edge and storage applications, new system on chip (SoC) packaging schemes, 3D packaging using TSVs, and in-memory compute, which has the potential to deliver an 8X reduction in energy.”

There are several options here. One option is to integrate multiple dies in the same package. “Because the memories are stacked vertically, 3D memory solutions provide maximum capacity with a smaller form factor,” Intel’s Deo said.

HBM packaging options
Traditionally, in high-end systems, packaging houses integrate logic dies and HBM in a 2.5D package. ASICs, FPGAs or GPUs can be used for the logic die. Samsung and SK Hynix are the main suppliers of HBM, while others are looking at the market. Several different packaging houses incorporate HBM in a package.

In 2.5D, the logic and HBM are placed side-by-side on top of an interposer, which incorporates through-silicon vias (TSVs). The interposer acts as the bridge between the chips and a board. This, in turn, brings logic closer to the memory, enabling more bandwidth.

But 2.5D is also an expensive solution. The package size is large, and it comes with thermal management challenges. That’s why 2.5D is relegated to high-end applications.

It’s difficult to reduce the costs. 2.5D is a steady market, but the volumes are relatively small and not enough to offset what is a complex manufacturing process.

HBM has many of the same issues. Originally announced in 2013, the first HBMs were 4-die stacked DRAM products with 1GB capacities. HBM has 1,024 I/Os, where each I/O or pin has a speed of 1Gbps. This equates to 128GB/s of bandwidth. I/Os are intermediate structures or pads. They connect the signals from the chip to the pins of the package.

Today’s HBM products, based on the HBM2 spec, enables 4/8GB capacities. It has the same number of I/Os (1,024), but the pin speed is 2.4Gbps, equating to 307GB/s of bandwidth.

Three years ago, HBM cost about $120/GB. Today, the unit prices for HBM2 (16GB with 4 stack DRAM dies) is roughly $120, according to TechInsights. That doesn’t even include the cost of the package.

The latest HBM version is based on the HBM2E spec, which has 8/16GB capacities. It has 1,024 I/Os with 3.2Gbps transfer rates. “That means you get more bandwidth,” said Jeongdong Choe, an analyst at TechInsights.

With 410GB/s of bandwidth, HBM2E is sampling. “The HBM2E era will occur in the first half of 2020,” Choe said.

The next version, HBM3, has 4Gbps transfer rates with 512GB/s bandwidth. “HBM3 will be released in 2H of 2020,” Choe said. “After HBM3, there is no concrete roadmap yet.”

In all cases, HBM stacks DRAM dies on top of each other and connects them with TSVs. For example, Samsung’s HBM2 technology consists of eight 8Gbit DRAM dies, which are stacked and connected using 5,000 TSVs. In total, HBM2 enables 307GB/s of data bandwidth, compared to 85.2GB/s with four DDR4 DIMMs.

Recently, Samsung introduced a new HBM version that stacks 12 DRAM dies, which are connected using 60,000 TSVs. The package thickness is similar to the 8-die stack version. “This is for data-intensive applications, such as AI and high-performance computing,” said Jim Elliott, senior vice president of sales and marketing at Samsung. “That gives us 24 gigabytes of density. That’s a 3x improvement over the prior generation.”

HBM2E and eventually HBM3 are faster. But at each iteration, 2.5D/HBM packages become more difficult to make.

In the manufacturing flow, the logic and DRAM dies are separately fabricated in the fab. Then, in the HBM flow, tiny TSVs are formed in each DRAM die using an etch process, followed by a copper fill. CD uniformity is critical here.

Then, tiny copper microbumps are formed on top of the die. Bumps are solder-based interconnect structures, which provide small, fast electrical connections between different dies. In HBM, the bumps are 25µm in diameter with 55µm pitches.

Bumps are formed on the TSVs using a series of deposition, lithography and other steps. “Due to the need to stack DRAM chips, TSV technology will be required, and this will involve improved resolution and tighter overlay from a lithography standpoint. The bigger challenge is die-to-die variation due to stacking, which can impact yield. This is not an issue for lithography, but it is for other processes,” said Shankar Muthukrishnan, senior director of technical marketing at Veeco.

Indeed, there are several challenges with fine-pitch structures. “With higher bandwidth of the chip-to-chip interconnection, the interconnect pitch and consequently the solder volume must shrink,” said Thomas Uhrmann, director of business development of EV Group. “For these reasons, it is more challenging to control the liquefying systems such as solder during chip assembly, where shorts between contacts due to uncontrolled solder squeeze out is the most common failure mechanism.”

Meanwhile, once the bumps are formed, the die is flipped and placed on a temporary carrier. The backside of the structure is thinned, which exposes the TSVs. Then, on the backside, microbumps are formed on the die. The temporary carrier is debonded, resulting in a die with bumps on each side. Stress is an issue in bonding/debonding.

Finally, the DRAM dies are attached and bonded to each other, and an underfill material is inserted between each die.

For fine-pitch requirements, the industry uses thermal compression bonding (TCB), which is a slow process. A TCB bonder picks up a die and aligns the bumps to those from another die. It bonds the bumps using force and heat.

TCB is making improvements. “Pre-fluxing the substrate has improved the productivity of TCB, but it’s still slower than standard flip-chip,” said Bob Chylak, vice president of global engineering at Kulicke & Soffa (K&S).

Then, the HBM stack and logic die are mounted on a silicon interposer in the 2.5D package. There are more steps after that, as well.

Besides 2.5D, meanwhile, there are other packaging options that could reduce the cost of packaging and perhaps even HBM. Intel, for example, has developed a silicon bridge, which is an alternative to the interposer. A bridge makes use of a tiny piece of silicon with routing layers that connects one chip to another in a package.

Intel refers to its bridge as the Embedded Multi-die Interconnect Bridge (EMIB). Using EMIB, Intel can combine four HBM2 stacks and a 10nm FPGA in a system-in-package (SiP), enabling 512GB/s of bandwidth.

Fan-out, meanwhile, is another packaging option. Typically, fan-out packages are used in automotive, servers and smartphones.

“(Fan-out is) a cost-effective way to achieve lower-profile packages without using an inorganic substrate to produce chip packages that are thinner and faster without the need for interposers or through-silicon-vias (TSVs),” said Shelly Fowler, a principal applications engineer at Brewer Science, in a blog.

ASE and others are developing fan-out packages with HBM. “The electrical performance is better than a 2.5D interposer solution,” said John Hunt, senior director of engineering at ASE. “You have less insertion loss, better impedance control and lower warpage than 2.5D. It’s a lower cost solution with better electrical performance. The difference is that 2.5D can do finer lines and spaces. But we can route the HBM2 dies with our current 2μm line and space.”

In fan-out, chips on a wafer are diced. The dies are placed in a wafer-like structure with an epoxy compound. A number of fan-out packages are developed in the wafer-like structure.

In production, the wafer-like structure is prone to warpage. Then, when the dies are embedded in the wafer, they tend to move, causing an unwanted effect called die shift. This impacts the yield.

Advanced HBMs
In R&D, meanwhile, the industry is working on new technologies to overcome the limitations with today’s packages.

In 2.5D, for example, the most advanced microbumps are tiny structures with a 40μm pitch. A 40μm pitch equates to a 25μm bump diameter with 15μm spacing.

Going forward, the industry can scale the bump pitch down to 20μm, possible 10μm. Then, the aspect ratios of the bumps and pillars become difficult to control.

So starting at 20μm to 10μm bump pitches, the industry needs a new interconnect solution, namely copper hybrid bonding. For this, the idea is to stack and connect dies directly using a copper-to-copper diffusion bonding technique, thereby eliminating the need for bumps and pillars.

This enables a new class of 2.5D packages, 3D-ICs and HBMs. These may appear by 2021 or sooner. A 3D-IC is a system-level design that mimics a traditional system-on-a-chip (SoC). Unlike SoCs, which integrates all functions on one die, 3D-ICs integrate smaller dies in a package. Potentially, 3D-ICs have a lower cost and better yields.

Copper hybrid bonding isn’t new. For years, the technology has been used for CMOS image sensors. But migrating the technology for advanced chip stacking, such as memory on memory and memory on logic, is challenging and involves complex fab-level processes.

“People are trying to design chip-to-wafer bonding with this process,” K&S’ Chylak said. “This is challenging because it requires a Class 1 cleanroom mini-environment with an assembly machine. The machine needs to have high accuracy of 0.2μm at 3-sigma. This will be an expensive machine.”

Nonetheless, TSMC and others are developing copper bonding technology. Others are licensing the technology from Xperi, which is called Direct Bond Interconnect (DBI). DBI enables pitches down to 1μm.

Several image sensor vendors have licensed DBI. UMC and others have also licensed it. “We believe wafer bonding is one of the major technology trends in the future,” said Steven Liu, vice president of corporate marketing at UMC.

Hybrid bonding can be used to bond two wafers together (wafer-to-wafer bonding) and a chip to a wafer (die-to-wafer bonding).

In the flow, metal pads are recessed on a wafer. The surface is planarized, followed by a plasma activation step. The process is repeated with a separate wafer. The wafers are bonded using a dielectric-to-dielectric bond, followed by a metal-to-metal connection.

In R&D, using hybrid bonding, vendors are working on new forms of HBM, such as 16-die stacks with low profiles. Some are working on stacking three DRAM dies, which is called 3DS. TSMC has demonstrated a four DRAM die stack.

With Xperi’s technology, there is no underfill between each die. “Today, the way they are stacking high-performance DRAM is they are using either flip-chip or a thermal compression bond. The problem that they are having is: 1) scaling to a finer pitch; 2) it’s dealing with underfill,” said Craig Mitchell, president of Invensas, which is part of Xperi. “When you fill it with underfill, it is not highly thermally conductive. What ends up happening is the die on the bottom of the stack and the die on the top of the stack are at different temperatures than the die in the middle.”

Eliminating the underfill enables a low-profile stack. “The stack acts more like one die. We get much more uniform thermals in the stack,” Mitchell said.

As stated, though, copper hybrid bonding is a challenging process for advanced packaging. “Wafer-to-wafer level bonding has two fatal flaws,” K&S’ Chylak said. “The chips need to be the same size. In most applications, this will not be the case, so you lose valuable silicon space.”

Plus, wafer yields aren’t always perfect. Let’s say the yields of both wafers is 80%. “The resulting packages will yield 64%,” Chylak said. “You end up bonding some good die to bad ones.”

Clearly, there are several packaging options for HBM. HBM, of course, is not for all applications. You won’t see it in smartphones, nor will it replace DRAM.

But HBM is becoming more critical in high-performance applications. The question is can it keep up amid the exploding demand for data.

Related Stories

DRAM Scaling Challenges Grow

The Next New Memories

Tricky Tradeoffs For LPDDR5

Utilizing Computational Memory

Why DRAM Won’t Go Away

3D NAND Race Faces Huge Tech And Cost Challenges

HBM2E: The E Stands for Evolutionary


Gil Russell says:

Hybrid Memory Cube disappears into High Bandwidth Memory with Near Data Processing at the bottom of the stack. Who knew?….

Leave a Reply

(Note: This name will be displayed publicly)