Stacked Die, Phase Two

Production versions of this packaging approach are hitting the market, with many more on the horizon. Problems remain from tooling to cost, and opinions differ widely about what’s next.


The initial hype phase of 2.5D appears to be over. There are multiple offerings in development or on the market already from Xilinx, Altera, Cisco, Huawei, IBM, AMD, all focused on better throughput over shorter distances with better yield and lower power. Even Intel has jumped on the bandwagon, saying that 2.5D will be essential for extending .

The next phase will build upon that experience to transform a pricey, highly customized packaging architecture into one that is geared toward high volume. While performance and power still will be key elements, the next phase will focus much more on pushing prices down and reliability up.

“A couple of years ago companies started looking at the performance of stacked die but they couldn’t overcome the BOM (bill of materials) cost,” said Brandon Wang, engineering group director at Cadence. “We’re now seeing production chips, and that’s having an impact on the rest of the industry. We’ve already seen the move of high-speed switching datacom industry to 3D from the end of last year to this year. So we’ve got the performance category, which includes supercomputing, datacom and gaming all moving there.”

And over the next half decade, there is talk across the semiconductor industry that 2.5D packaging will become a mainstay for the Internet of Things, where the real value is the ability to quickly package standard parts and IP, regardless of which process technology was used to develop them, to fit customized applications.

“This is the second category that will benefit from this,” said Wang. “It will be wafer-level packaging and it will connect a few thousand nodes (I/O’s for cross-package signals). From the design side, the first category involved designing the whole SoC. The second category will take existing designs, whether it’s a 180nm PLL or an ADC, and shrink the PCB into the package. That will also minimize risk and add flexibility because you can source from different vendors. The third category will be where low power and scalability are the true selling points.”

The key to this riddle won’t be changing the packaging or the components. It’s having enough of those components available to integrate into a system, and if they don’t all have to be integrated onto the same die then vendors can take a 90nm analog sensor, mix it with a 28nm microcontroller, and connect it all to a memory block using an interposer or some other low-cost connector.

This will take some time, of course. But the first inklings that this will even be possible are beginning to show up on the market in real products.

High-bandwidth memory
One of the key pieces that was missing until late last year is high-bandwidth memory (HBM), which is now available from SK Hynix, with others reportedly working on similar solutions. HBM is a multi-layer DRAM that adds connections for thousands of I/Os using through-silicon vias. The first version provides up to 128GB per second bandwidth using a four-chip stack. The next version, HBM 2.0, will provide up to 256GB per second.

“We did the math, and that’s about 32 times the bandwidth of LPDDR3,” said Loren Shalinsky, strategic development director at Rambus. “HBM has been touted more of a graphics DDR replacement, but you’re also going to see a pickup in the data center and networking.”

HBM is fast enough that it can function like an L4 cache. Unlike a dual inline memory module (DIMM), at this point it doesn’t have the flexibility in terms of amount of memory needed for a particular task, so it can’t be upgraded or migrate to the next version. “The advantage is that it is more power efficient, partly because of the way it is integrated,” said Shalinsky. “You also can use a relatively lower speed connection (using multiple requests at the same time), and you get a benefit at the system level on the power needed to drive signals because of the shorter channel and fewer connectors.”

Tools and challenges
HBM has another benefit, as well. It is the first standard to allow design across multiple dies using a conventional design methodology, noted Cadence’s Wang.

But there are some challenges to deal with, as well. If you think about power as CV²F, there is little that can be done with the capacitance and limits as to what can be done with frequency within a given power budget. That leaves voltage as the big knob to turn, and that raises many challenges when it comes to memory.

“The foundry will guarantee nominal voltage,” said Farzad Zarrinfar, managing director of the Novelics business unit at Mentor Graphics. “But there are cases where you have dual Vdd where the Vdd of the core is different than the Vdd of the peripheral.”

In the IoT space, this becomes even more complex, he said, because of many devices will be manufactured at established process nodes—90, 65/55 and 40nm, where leakage gets progressively higher. That makes it harder to fully characterizing memory IP, a problem that is exacerbated by mixing memory with other components that weren’t part of the initial design, or with custom use cases or corners for devices such as wearable and implantable electronics, where always-on features and regular use can generate too much heat.

On top of that, not all tools work equally well with 2.5D—or at least not yet. And it doesn’t mean that engineers know how to best utilize those tools with this kind of chip architecture. Consider electrostatic discharge, for example. While ESD is well understood within an SoC, mapping the potential damage across chips that are swapped in and out can damage or kill a device.

“Some customers analyze different dies, but what happens if they use different microbumps to connect to the I/O interface between dies,” said Karthik Srinivasan, principal applications engineer at Ansys-Apache. “The problem has not been well defined for EDA. You can test the individual die, but stacking changes the ESD requirements. Companies building these chips need to get a good grasp of all the issues that they’re dealing with. After that it will trickle down to EDA.”

He said electromigration is much better defined, and existing tools can model the impact of one die on another over time, both on a planar and stacked die level. A charged device model, in contrast, is not understood in the context of the overall assembly process.

Other tools will need to be added, as well. “There are gaps in the flow today for 2.5D,” said Anand Iyer, director of marketing for the low power platform at Calypto. “With new architectures you need to see first how they work, then how you can reduce power. This needs to be ironed out at the RTL level, but the savings can be significant. The actual cost of a transistor increases about 5% when you move from planar to finFET.”

He said the costs of 2.5D are deceptive because in terms of total chip, there are fewer transistors needed to improve performance because the throughput is so much greater. By reducing the logic portion and adding to the I/O, which is essentially what stacking does, the costs are comparable. But that requires a system-level cost analysis, and many companies don’t look at it that way.

Continuing Moore’s Law
Moore’s Law isn’t standing still, either, and there is work underway to ease some of the problems that have made shrinking features a challenge below 28nm, notably on the analog side.

“The trick is to do analog in the non-linear regions, which may sound counterintuitive,” said Bernard Murphy, chief technology officer at Atrenta. “What you’re doing is digitally assisted, so you’re basically compensating for the non-linearity. If you look at RF, for example, this is one technique is to use with smaller feature size. But that also results in non-linear behavior. You can compensate for that by applying a reverse transform on the signal transmit.”

He noted this approach is still at the academic research level, but it does allow engineers to estimate power models and timing constraints more easily and in a more timely manner, rather than waiting for the analog to be finished.

There also is a wide divergence in opinion when it comes to Moore’s Law versus stacked die.

Aart de Geus, chairman and co-CEO of Synopsys, believes the 2.5D and 3D effort has “dramatically slowed down. It’s complex, expensive, and there are legal questions and difficult collaboration efforts.”

De Geus said the one exception is in the memory space, but for advanced chips Moore’s Law shows no sign of abating. ” is a fundamental requirement for some layers, and from a design point of view we’ve completely hit this.”

Others disagree. Herb Reiter, president of eda2asic consulting, said that once costs are reduced 2.5D designs will explode. “We’ve already seen 50 to 100 interposer-based designs,” he said. “Many are complete and in silicon, but they’re not in production yet because of the cost. People are waiting for prices to come down, but the technical advantages are there. You can get three times the bandwidth for the same amount of power. We’re also seeing interposers becoming more and more capable. Capacitors and power management is actually being moved onto the interposer, which means you can move some of the metal layers. This technology will succeed.”

And still others, believe the answer lies somewhere in the middle. Frankwell Lin, president of Andes Technology, said that initially everything will be separate in a package, but that ultimately the same consolidation trend will continue inside the package. “Everything ultimately will come together because of cost,” he said. “You’ll see a gradual evolution back to the SoC.”

It may take more than a decade to see who is right, but it’s clear is there are some sharp differences in opinion forming over this approach.

Leave a Reply

(Note: This name will be displayed publicly)