System-Level Packaging Tradeoffs

Growing complexity is creating an array of confusing options.


Leading-edge applications such as artificial intelligence, machine learning, automotive, and 5G, all require high bandwidth, higher performance, lower power and lower latency. They also need to do this for the same or less money.

The solution may be disaggregating the SoC onto multiple die in a package, bringing memory closer to processing elements and delivering faster turnaround time. But the tradeoffs for making this happen are becoming increasingly complex, regardless of the advanced packaging approaches.

In PCB-based systems, there are lots of devices on one board. Over time, as scaling allows for tighter integration and increased density, it is possible to bring everything together onto a single die. But this also is swelling the size of some designs, which actually can exceed reticle size even at the most advanced process nodes.

But choosing what to keep on the same die, what components to put on other die, and how to package them all together presents a lot of choices.

“2.5D and 3D is making this very complex, and people cannot always understand what’s going on,” said Rita Horner, senior staff, product manager at Synopsys. “How do you actually go about doing that? How do you do the planning? How do you do the exploration phase before knowing what to put in what, and in what configurations? Those are the challenges a lot of people are facing. And then, which tool do you use to optimally do your planning, do your design, do your implementation validation? There are so many different point tools in the market it’s unbelievable. It’s alphabet soup, and people are very confused. That’s the key. How do you actually make this implementation happen?”

Fig. 1: Some advanced packaging options. Source: Synopsys

Historically, many chipmakers have taken an SoC approach that also included some analog content and memory. This was likely a single, integrated SoC, where all of the IP was on a single die.

“Some people still do that because for their business model, that makes the most sense,” said Michael White, product marketing director at Mentor, a Siemens Business. “But today, there are certainly others putting the CPU on one die, having that connected [TSMC] InFO-style, or with a silicon interposer connected up to an HBM memory. There may be other peripheral IP on that same silicon interposer to perform certain functions. Different people use different advanced packaging approaches, depending on where are they going. Are they going into a data center, where they can charge a real premium for that integrated package they’ve created? Or are they going in mobile, where they’re super cost-sensitive and power-sensitive? They might tend to drift more towards the InFO-style package approach.”

Alternatively, if the die configuration contains relatively few die, and is relatively symmetric, TSMC InFo may be utilized where they’re connecting everything up through the InFo and don’t have a silicon interposer, with everything sitting on an organic substrate. There are many different configurations, he said. “It’s really driven by how many die do you have? Is the layout of the die or chiplets that you’re trying to connect up relatively few and a rather symmetric configuration? InFo style technology as well as other OSAT approaches are used versus a silicon interposer, if possible, because the silicon interposer is another chunk of silicon that you have to create/manufacture so there’s more cost. Your bill of materials has more cost with that, so you use it if your performance absolutely drives the need for it, or the number of chiplets that you have drive you in that direction. Another approach we’re seeing is folks trying to put some active circuitry in that silicon interposer, such as silicon photonics devices in the silicon interposer. Big picture, packaging technology being used is all over the map, and it’s really a function of the particular application, the market, and the complexity of the number of chiplets or dies, and so on.”

System-level problems
All of this speaks to the ongoing quest to continue to make gains in the system, which is manifesting in any number of architecture choices, compute engine decisions, in addition to packaging options. Paramount in the decision-making process is cost.

“Chiplets are very interesting there. That seems to be the preferred solution now for dealing with the cost,” said Kristof Beets, senior director of technical product management at Imagination Technologies. “We’re seeing a lot of GPUs that have to become scalable by just copying multiple and connecting them together. Ramping up nodes is great in one way, but the cost isn’t so great. The question is really how to effectively create one 7nm or 5nm chip, and then just hook them together with whatever technique is chosen to ideally double the performance. Engineering groups are looking to do this to create multiple performances out of just a single chip investment.”

Here, the question as to whether you know exactly what application you will be creating your chip for should be posed first.

“This market is fast-moving, so you’re never quite sure that requirements which were set for the application when you started designing the chip will be the same at the end of the design and verification cycle,” said Aleksandar Mijatovic, design engineer at Vtool. “That is the first concern, which has to give you some push towards including more features than you actually need at the time. Many companies will try to look forward and get ready for a change in standards, change in protocols, change in speeds in order not to lose that one year of market which will arrive, when other components in that application get upgraded, and they are just put on the market something, which is based on a one-year-old standard.”

Other issues must be paid attention to. “Full design, full verification plus mask making, and manufacturing is quite an expensive process so something you may think as too much logic might be there just because it turned out to be cheaper than to produce two or three flavors of one chip doing full verification, full masks and the whole packaging lines,” Mijatovic said. “Sometimes it’s just cheaper not to use some features than to manufacture a lot of them similar to when AMD came out with dual core processors which were quad cores, just with two cores turned off because it was already set up and nobody cared to pay for the expense of shrinking. A lot comes down to architecture, market research, and bean counting.”

When it comes to chiplets, from the perspective of the verification domain, the biggest challenge for pre-silicon verification is that there might be more complexity (i.e., a bigger system), at least potentially, and more interfaces and package-level fabric to verify, Sergio Marchese, technical marketing manager at OneSpin suggested. “On the other hand, if you have a bunch of chiplets fabricated on different technology nodes, those different nodes should not affect pre-silicon verification. One thing that is not clear is: if you figure out that there is something wrong, not with a specific chiplet, but with their integration, what’s the cost for a ‘respin?’”

One-off solutions
Another aspect to advanced packaging approaches today is that many are unique to a single design, and while one company may be building the most advanced chip, they don’t get enough out of scaling in terms of power and performance. They may get the density, but they don’t get the benefits that they used to out of scaling. They turn to packaging to get extra density with an architectural change. While this is possible for leading-edge chipmakers, what will it take for mainstream chipmakers to access this kind of approach for highly complex and integrated chips, chiplets, and SoCs?

This requires a very thorough understanding of the dynamics of how these systems are getting connected together.

“Typically, the architect, chip designers, package designers, PCB designers all operate independent of each other, sometimes sequentially, in their own silos,” said Synopsys’ Horner. “But die-level complexities are increasing so much that they no longer can operate independently of each other because of the complexity. If they really want to make multi-die integration go mainstream, be affordable, more reliable, with faster turnaround time to get to the market, there needs to be a more collaborative environment where all these different disciplines, individuals actually can work together from early stages of the design to implementation, to validation, and even to the manufacturing. If there is a new next-generation iteration happening, it would be nice to have a platform to go back to and learn from to be able to further optimize for the next generation without going back to paper and pencil, which a lot of people are using to do their planning and organizing before they start digging in.”

However, there is no platform to allow all these different disciplines to collaborate with each other and to learn from each other. “This collaborative environment would allow people to even go back to the drawing board when they realize they must disaggregate the die because it’s getting too large,” Horner said. “And because the die has to be disaggregated, additional I/O must be added. But what type is best? If the I/O is put on one side of the chip versus the other, what’s the impact of the substrate layer design. What that means translates to the package and the board level, where the balls are placed in the package, versus the C4 bumps in the substrate, or the micro bumps in a die.”

She suggested that the ideal situation is to have a common unified platform to bring all the information into the simulation environment. “You could bring the information from the DDR process, or from the HBM, or the CMOS technology where the CPUs may be sitting, and then have enough information brought in to extract the parasitics. And then you can use simulation to make sure the wiring that you’re doing, the spacing and the width of traces that you’re using for your interconnect, or the shielding you’re doing, are going to be able to meet the performance requirement. It is no different than past approaches, but the complexity is getting high. In the past when you did a multi die in the package, there were very few traces going between every part. Now, with the HBM you have thousands just for one HBM connection. It’s very complex. That’s why we are seeing silicon interposers enabling the interconnect, because it allows the fine granularity of the width and spaces that are needed for this level of density.”

Looking past Moore
While there may be more attention being paid to advanced “More Than Moore” approaches, Mentor’s White said there are plenty of engineering teams still following Moore’s Law. They are building an SoC with GPUs and memory on a single die.

“We still see that, but we definitely see folks who also are looking at building that assembly with multiple die,” Mentor’s White said. “Today, that might include a CPU sitting next to a GPU connected up to HBM memory, and then some series of other supporting IP chiplets that may be high speed interfaces, or RF, to Bluetooth or something else. We certainly see lots of companies talking about that approach, as well, of having multiple chiplets, including a GPU, and all of this assembly — and then using that as an easier way to then substitute in or out some of those supporting chiplets differently for different markets. In one particular market, you’re going to have RF and whatever. Another market may demand a slightly different set of supporting chiplets on that SoC, maybe more memory, maybe less memory, for whatever that marketplace needs. Or maybe I’m trying to intercept different price points. For the top of the line, I’ve got tons of HBM and a larger processor. Then, for some lower-level technology node, I’ve got a much more modest memory and a smaller GPU because the market pricing would not support that first one.”

Other must-have requirements for package design include being able to understand all the different ways you can attach chips to package designs, whether it’s wirebond, flip-chip, stacking or embedding.

“You have to have a tool that understands the intricacies of that cross section of the design,” said John Park, product management director for IC packaging and cross-platform solutions at Cadence.

Park noted what is often overlooked is a connectivity use model. “This is important because the chip designer may use something, where they take the RTL and netlist it to Verilog. And that’s the connectivity. Board people use schematics for their connectivity. Then, packaging people sit somewhere in the middle, and for a lot of the connectivity, they have the flexibility to assign the I/Os based on better routing on the board level. They need the ability to drive the design with partial schematic, and the flexibility to create their own on-the-fly connectivity to the I/O that are somewhat flexible. They need the ability to work in spreadsheets. It’s not a single source for the connectivity, but it can be, and some people would like it that way. But it’s more important to have a really flexible connectivity model that allows you to drive the schematics, spreadsheets, build connectivity on-the-fly,” he said.

Park added that it’s important to have tight integration with mask-level sign-off tools to improve routing, with specific knowledge of metal fill and RDL routing to create higher yielding designs. “The most important aspect is that it be a traditional BGA tool, but with the ability to integrate with mask-level physical verification tools for DRC and LVS. So I can take the layout, point to a rule deck in my verification tool, and any errors are fed back into the layout so I can correct those. That’s an important flow for people who are extending beyond BGA into some of these fan-out wafer level packages,” Park concluded.

Fig. 2: Basic chiplet concept. Source: Cadence

Leave a Reply

(Note: This name will be displayed publicly)