Choosing The Right Interconnect

Packaging options increasing as chipmakers vie for higher performance, lower power and faster time to market.


Efforts to zero in on cheaper advanced packaging approaches that can speed time to market are being sidetracked by a dizzying number of choices.

At the center of this frenzy of activity is the interconnect. Current options range from organic, silicon and glass interposers, to bridges that span different die at multiple levels. There also are various fan-out approaches that can achieve roughly the same high performance and low-power goals as the interposers and bridges.

What’s driving all of this activity is a recognition that the economic and performance benefits of shrinking features are dwindling. While this has been apparent on the analog side for some time, it’s now beginning to impact ASICs for a different reason—the immaturity of applications for which chips are being designed.

In artificial intelligence, deep learning and machine learning, which collectively represent one of the hot growth markets for chips, the training algorithms are in an almost constant state of flux. So are the decisions about how to apportion processing between the cloud, edge devices and mid-tier servers. That makes it far more difficult to commit to building an ASIC at advanced nodes, because by the time it hits the market it already may be obsolete.

The situation is much the same in the automotive segment, where much of the technology is still in transition. And in burgeoning markets such as medical electronics, augmented and virtual reality, IoT and IIoT, no one is quite sure what architectures will look like or where the commonalities ultimately will be. Unlike in the past, when chipmakers vied for a socket in a mobile phone or a PC or server, applications are either emerging or end markets are splintering.

That has helped push advanced packaging into the mainstream, where there are several important benefits:

• Performance can be improved significantly by routing signals through wider pipes—TSVs, bridges or even bonded metal layers, rather than thin wires.
• Distances between critical components can be reduced by placing different chips closer to each other rather than on the same die, thereby reducing the amount of energy required to send signals as well as the time it takes to move data.
• Components can be mixed and matched from multiple process nodes, which in the case of analog IP can be a huge time saver because analog circuitry does not benefit from shrinking features.

Still, advanced packaging adds its own level of complexity. There are so many options in play in the packaging world that it isn’t clear which approaches will win. The outcome depends largely on the choice of interconnect, which serves as the glue between different chips.

“The key here is the shorten the time to development, particularly for AI,” said Patrick Soheili, vice president of business and corporate development at eSilicon. “On one side, you can’t afford not to do the chip right away because you can’t be left behind. But you also have to worry about future-proofing it. The goal is to get both.”

DARPA has been pushing chiplets as a way to standardize the assembly of components. The first commercial implementation of this sort of modular approach was developed by Marvell Semiconductor with its MoChi architecture. Marvell still uses that internally for its own chips, which it can customize for customers using a menu of options. DARPA’s CHIPS program takes that one step further, allowing chiplets from multiple companies to be mixed and matched and combined through an interposer.

“Chiplets are absolutely part of the solution,” said Soheili. “But this isn’t so easy. If a 7nm ASIC has to sit in the middle and connect to 180nm chiplets, something has to line up the data and send it over a link.”

Different types of interposers
As companies working with advanced packaging have discovered, this can be time-consuming and expensive. It is assumed that once these various approaches can be vetted and standardized, this process will become quicker and cheaper. That could involve sidestepping silicon interposers, which can run as high as $100 for the interposer itself in complex devices that require stitching of multiple reticles.

“There is overall agreement that silicon interposers are expensive,” said Ram Trichur, director of business development at Brewer Science. “The question is what to replace it with. The challenge with organic interposers has been warpage. There are a lot of companies addressing these challenges and working with certain formats for organic interposers. Some are directly mounted, others need a substrate.”

Kyocera, Shinko Electronics and Samsung independently have been developing organic interposers using epoxy films that can be built up using standard processes. One of the key issues here has been matching the coefficient of thermal expansion (CTE) with that of silicon. This isn’t a problem with silicon interposers, of course, but it has been an issue with organic laminates and underfill. Reducing the thickness of the interposer layer has been found to help significantly, according to several technical papers on the subject.

Fig. 1: Organic interposer. Source: NVIDIA/SEMCO

It’s still not clear if this will be a commercially viable alternative to silicon interposers, however. “With an organic interposer you get the same lines and spaces as a silicon interposer, but by the time you address all of the issues you come up with basically the same cost at the end,” said Andy Heinig, a research engineer at Fraunhofer EAS. “The problem is that you need a system-level study to find out which is the best solution for a design. One of the variables is that you need to transfer a huge amount of data on these devices. If you reduce that to a certain point, you can use an organic interposer. But it’s more of a task to find that out than with a silicon interposer.”

Organic interposers aren’t the only alternative. “There is also work on glass interposers, which are tunable,” said Brewer’s Trichur. “The CTE of glass matches silicon, so you get low loss, which is suitable for high-frequency applications. Glass is also good for panel-level processes, and the cost is low.”

Fig. 2: Glass interposer in test vehicle. Source: Georgia Tech

Interposer alternatives
One of the big attractions of 2.5D silicon interposers, or “2.1D” organic interposers, is improved throughput using arrays of TSVs rather than skinny wires. That allows a multi-pipe connection to stacks of DRAM, known as high-bandwidth memory.

The current HBM 2 JEDEC standard, introduced in 2016, supports up to 8 stacked DRAM chips with an optional memory controller, which is similar to the Hybrid Memory Cube. HBM 2 supports transfer rates of up to 2 GT/s, with up to 256 GB/s bandwidth per package. Over the next couple years that will increase again with HBM 3, which will double the bandwidth to 512 GB/s. There is also talk of HBM 3+ and HBM 4, although exact speeds and time frames are not clear at this point.

The goal of all of these devices is to be able to move more data between processor and memory more quickly, using less power, and 2.5/2.1D are not the only approaches in play at the moment. Numerous industry sources say that some new devices are being developed using pillars—stacked logic/memory/logic—on top of fan-outs. TSMC has been offering this capability for some time with its InFO (Integrated Fan-Out) packaging technology.

Other high-end fan-outs use a different approach. “Fan-out takes the place of the interposer,” said John Hunt, senior director of engineering at Advanced Semiconductor Engineering (ASE). “Chip-last is closer to an inorganic interposer, and the yield right now is as high as 99% using 4 metal layers and 2.5 spacing. The real objective of an interposer is to increase the pitch of active devices so you can route HBM2. High-end fan-outs perform better thermally and electrically because the copper RDL is thicker and the vias are less resistive. But they only work in cases where you don’t need 1 micron lines.”

There are a number of options available with fan-out technology, as well, including chip first, chip last, die up, die down. There also are flip-chip, system-in-package, and fan-out on substrate.

What’s important is that there are many ways to tackle this problem, and high-speed interconnects are now available using multiple packaging approaches. Until a couple years ago, the primary choices were fan-out, fan-in, 2.5D and 3D-IC and multi-chip modules, and there were distinct performance and cost differences between all of those. There are currently more options on the table for all of those approaches, and the number of options continues to expand, thereby blurring the lines.

Another approach uses low-cost bridges. Intel has its Embedded Multi-die Interconnect Bridge (EMIB), which it offers to Intel Foundry customers as an option for connecting multiple routing layers.

Fig. 3: Intel’s EMIB. Source: Intel.

Samsung, meanwhile, has announced an RDL bridge for its customers, as well, which accomplishes the same thing inside the redistribution layer (RDL).

Fig. 4: Samsung’s interconnect options. Source: Samsung

Both of those approaches can certainly cut the cost of advanced packaging, but they are more limited than an interposer. So while a bridge can provide a high-speed connection between two or more chips, there is a limit to how many HBM stacks can be connected to logic using this type of approach.

Moreover, while the bridges themselves are less expensive than interposers filled with through-silicon vias, they can be challenging to assemble because the connections are planar. The same kinds of warpage issues that affect multi-die packaging apply with bridge technology, as well.

Future goals and issues
One of the reasons this kind of in-package, and inter-package interconnect technology is getting so much buzz lately is that the amount of data that needs to be processed is increasing significantly. Some of that must be processed locally, using multiple processors or cores, and some of it needs to be processed remotely, either in a mid-tier server or in the cloud. All of the compute models require massive throughput, and trying to build that throughput into a 7/5nm chip is becoming much more difficult.

The rule of thumb used to be that on-chip processing is always faster than off-chip processing. But the distance between two chips in a package can be shorter than routing signals from one side of an SoC to another over a skinny wire, which at advanced nodes may encounter RC delay. None of this is simple, however, and it gets worse in new areas such as 5G.

“There are several materals and process challenges,” said Brewer’s Trichur. “First, you’ve got the structural package issues. Then, when we get into 5G, you’ve got a gap in materials with integrated dielectrics. 5G will be the next materials challenge. So now you’ve got to integrate new materials and new processors, all in a small package. You’ve got more switches, and you also have to integrate antennas, which requires a new process and new materials in itself. This is a whole new challenge.”

Another market where advanced packaging will play a critical role is in AI/ML/DL. The key metrics there are performance and power, but the bigger challenge is being able to churn out new designs quickly. The problem in this segment is that the training algorithms are in an almost constant state of flux, so being able to add new processors or IP is time-sensitive. An 18-month development cycle will not work if the processor or memory architecture needs to change every six months.

Trying to utilize off-the-shelf components for a single-chip solution can cause its own set of issues. “One of the problems we’ve been seeing in big SoCs is that companies are trying to glue everything together and the IP models are at different levels of abstractions and different speeds,” said Kurt Shuler, vice president of marketing at ArterisIP. “That requires you to shim and hack the interconnect model to get it to work. Even then, because of the ancestry of the models, they weren’t developed for pins or TCM (tightly coupled memory) interfaces, or they are cycle-accurate or approximately timed or loosely timed. So we’re seeing things that were not developed on a large scale. They were developed as a point problem.”

Advanced packaging can help that to a point. But most advanced packaging so far has been more about a particular application and a particular project, rather than developing a platform that can be used by many companies.

“If it works well, you can do great things,” said Raymond Nijssen, vice president of systems engineering at Achronix. “But there are many forks in that road. There are solutions with interposers or without. There are different data rates, so you have some solutions with very high data rates. And if you are doing chiplets, it depends on why you are doing chiplets. Is it because you can’t afford that many balls on a package, or is it an issue of power efficiency because you have a hard ceiling on power usage?”

So far, there are no clear answers to any of these questions. But the good news is that there are plenty of options, and many of them have been proven in real products in the market and shown to work.

The next challenge will be to build economies of scale into the packaging world. That will require the industry to narrow down its choices. Until now, many of these packaging approaches have been expensive to implement, which is why they have shown up in everything from smart phones, where there are sufficient volumes to offset the development cost, or in networking chips, where price is less of an issue.

In the future, advanced packaging will need to become almost ubiquitous to drive widespread applications of AI/ML/DL inference at edge nodes and in automotive and a variety of other new market segments. That requires repetition with some degree of flexibility on design—basically the equivalent of mass customization. This is the direction the packaging world ultimately will take, but it will require some hard choices about how to get there. The interconnect will remain the centerpiece of all of these decisions, but which interconnect remains to be seen.

Related Stories
Interconnect Challenges Rising
Resistance and capacitance drive need for new materials and approaches.
How To Choose The Right Memory
Different types and approaches can have a big impact on cost, power, bandwidth and latency.


BillM says:

Two areas that I was surprised were not included were power dissipation as well as signal and power integrity issues. As distances and geometries shrink, supplying sufficient voltages required for functionality are critical (power distribution networks: PDN). As the geometries enable denser packing, signal integrity becomes a bigger issue. Dense arrays of layer to layer interconnects, such as TSVs, create an environment for signal integrity issues and must be accurately analyzed. The same can be said for power dissipation where generated heat does not have efficient mechanisms to rapidly exit the structure.

Ed Sperling says:

Hi Bill,
Those are certainly interesting challenges. We have written a number of stories on PDNs, the impact of density and various types of noise (power, thermal, digital-to-analog, electromagnetic, etc.) and other physical effects on signal integrity, as well as thermal effects caused by gate leakage and various packaging approaches, particularly 3D-IC. Intel said at ISS two years ago it didn’t see a way forward for logic on logic, sandwiched between memories in a 3D-IC design, because the inner logic layer would be performance-constrained by heat. One of the interesting strategies early on was to use dedicated TSVs for ESD and heat dissipation. There also has been talk about microfluidics for removing heat. But all of this adds cost, time and reliability issues, and unless there is enough volume and demand they’ll never be solved. Many of those issues can be avoided with the more popular packaging approaches, but none of this is easy. —Ed

Gary Huang says:

The D-2-D interconnect of TSV or EMIB still need two solder joints between Die and Path. I think it is the origin of heat source during high frequency band width. Maybe next package can solve this problem.

Leave a Reply