Mixed signal content at advanced nodes and in packages is prompting alternative approaches.
Foundries and packaging houses are wrestling how to control heat in the testing phase, particularly as devices continue to shrink and as thermally sensitive analog circuits are added into SoCs and advanced packages to support everything from RF to AI.
The overriding problem is that heat can damage chips or devices under test. That’s certainly true for digital chips developed at advanced nodes, because as dielectric films become thinner and as the transistor devices themselves continue to shrink, they are more susceptible to physical damage from heat. But it is even more of an issue for analog circuits, which are becoming more common in designs, either on-chip or in-package, as data generated from sensors in the digital world is processed using digital technology.
Chipmakers are starting to utilize different approaches to deal with these issues. They are greatly increasing their efforts in simulation and verification in the design phase, modifying their approaches to finding defects during manufacturing, and adding capabilities to continue monitoring chips and systems even after they leave the fab or packaging house.
“If you think about a system in 2015, which is not very old, there was an SoC, DRAM connected to it on the PCB, and the SoC was mostly digital with embedded SRAM,” said Hany Elhak, group director of product management and marketing at Synopsys. “Now, advanced SoCs include all of these analog functions. With a 5nm SoC today, we have lots of true analog blocks that were their own separate mature-node ICs a few years ago for things like power management and data conversion. So these are much bigger, more complex analog circuits on these chips.”
So instead of designing and testing the analog separately, it needs to be considered in the context of a much more complex system. That makes testing using traditional approaches more difficult, more complicated, and potentially more destructive, requiring a plan at the architectural stage for how these devices will be designed, laid out, partitioned, verified, validated, and ultimately tested and monitored.
“The team designing the memory, the team designing analog, those designing custom digital and foundation IP, and those working with signal integrity — they all need to have a unified workflow so they can actually work with each other,” Elhak said. “Hyper-convergence is opening the door for solving certain problems that were not needed before. The good news is we have technology that has been established in certain areas that can be re-used for other applications.”
This kind of convergence in tooling has become essential. A recent study commissioned by Siemens EDA found that that the biggest driver of recent re-spins is mostly due to analog circuitry. “What is going on is that the industry continues its migration to advanced nodes, where variability is extremely difficult to model,” said Harry Foster, chief scientist for verification at Siemens EDA. “On top of that, these models are evolving with the process. There are lots of corners to verify. But a more interesting trend is that we are seeing an increase in the number of complex mixed-signal designs, regardless of technology node, as companies try to optimize the area footprint to include analog.”
Integration itself is a challenge. “In the past, the analog circuitry — even though there was some analog circuitry on the chip — was discrete in many cases. ” said Aveek Sarkar, vice president of engineering at Synopsys. “It was at a mature node, when the data rates used to be much lower. And most of these chips used to come together on a PCB and were designed and evaluated in that manner. We now aggregating a lot of different design types, a lot of different applications on the same SoC.”
Fig. 01: Scale and systemic complexity has increased with advanced-node SoC that have analog integrated. Source: Synopsys
With IC design converging, testing of analog parts becomes an issue. Thermal issues already plague complex digital designs. Adding analog only exacerbates the problems.
“With some products, especially on the high end, they can’t test everything with a probe,” said Mike Kelly, vice president of advanced packaging development and integration at Amkor. “Everyone does their best to weed out failures or bad parts. Prior to us getting those packages, we get a wafer map and everything on there is supposed to be good. And then those parts get assembled into a system test, which is basically emulating a product. If you’ve got more silicon content going into a package that isn’t 100%, they you will have some package yield fallout. But yields and probe testing are good enough that the economics still work.”
From a design standpoint, advanced packaging opens up all sorts of architectural possibilities, both for improving performance and for lowering power and improving heat dissipation. But testing becomes more complex because not all of the devices in a package are exposed to the testers. In addition, driving signals through a package to test digital components sufficiently can damage sensitive analog components.
So in addition to more up-front verification and simulation, packaging also requires more customized approaches to testing. Some parts may be accepted as known good die prior to packaging, and that may be enough. Others may require more testing because they are used for safety-critical or mission-critical functions.
“You certainly cannot get every I/O inside the channel to fan out, so some companies may skip a row of bumps, and if one of them is bad, they will show ‘Fail’ and go into default mode,” said Alan Liao, director of product marketing at FormFactor. “You can test everything you want, but the cost will be very high. Sometimes you can offset that by spending a little bit more money at an early stage in R&D, and then when you move into production, you may want to balance the cost of testing.”
Fig. 02: Power dissipation levels of different devices. Source: Amkor/MEPTEC talk.
Automated test equipment
Likewise, automated test equipment (ATE) can limit the heat, but the calculation has to be part of the test plan.
“Unlike the ‘real world,’ we have quite a bit more control in ATE than they do in a field application. We can ‘walk up’ to the max thermal requirements and test around the edges rather than just some brute force method of testing,” said Tim Burnett, applications engineer consulting manager at Advantest. “Even in digital testing, when you test say hot, you are not going to heat the die up to 125°C in a handler and then run a thermal load that takes it well past 200°C internal. That being said, you will exceed the external handler chamber temperature for the die internal. But in general, we monitor internal die temp so as not to exceed maximum die temp values (which can be quite a bit higher than the package part spec). Anyone who has inadvertently loop run die on a large BiST pattern, only to weld the socket or burn needles at probe, knows this all too well.”
Older analog developed at older process nodes has more resilience. “Analog is no different when testing for say overcurrent or full load on things like BUCK, BOOST, BOOST-BUCK converters or Class D type amplifiers that have internal switches. ATE allows us the control to not run a die for extended periods of time that would overheat the die internal and still accomplish the specific requirement. Internal temp testing is also common for many of these die. An example would be an overcurrent test for a Class D amplifier at 20A, we may be looking for a comparator trip on some GPIO pin after exceeding the high current in say 10 or 15µS. A 20µS pulse at 20.1A should be more than enough to accomplish this task without overheating the die in the process. Heat management in ATE is about control of the die and understanding how to accomplish a task with finesse. Brute force will invariably get you into trouble if you don’t understand the device you are working on and trust me, I have let out my fair share of factory installed smoke in the past 25 years.”
ATEs that are capable of testing complex SoCs exist, of course, and have for a while. Advantest’s 93000-L ATE — at the top of the 93000 family of scalable SoC test equipment — can test digital, analog, and mixed signal.
Other issues
But functional testing will only get you so far, particularly with a heterogeneous design. Some things also can be seen that may not show up in a functional test, and that can be done without damage to the chip or package.
“Even if you’ve check out a chip before it leaves the fab, as it goes through downstream processing things are heating up,” said Subodh Kulkarni, CEO of CyberOptics. “There is soldering, wirebonding and other processes, and things shift. Even at 150° Celsius, where nothing is supposed to move, things move enough to create added problems. So you do the I/O check as chips or packages are coming down the line and everything looks okay. But by the time it is shipped out in the field you may have infant mortality or a failure six months down the road because the I/O check was done before everything settled into place. The reason that happens is that not everything is as settled as you think they should be. That’s why companies are telling us they want to add 100% inspection before these devices leave the factory, because it gives them one more chance to catch a problem before it becomes a really big issue in the field.”
Chipmakers have added another step after that, as well. Unlike in the past, when the only way to tell what went wrong in the field was through post-mortems of a defective device, monitors can be inserted into chips to track a chip’s health throughout its lifecycle. This can involve such measurements as temperature or vibration. That data then can be looped back into the manufacturing and design flows to avoid future problems, as well as to do preventive maintenance or proactively provide replacements in the field.
“We actually sit in between the digital and analog — in monitoring the entire ecosystem itself,” said Uzi Baruch, chief strategy officer (CSO) at proteanTecs. “Our monitors are embedded on the digital side, but can monitor the entire performance and behavior of the IC altogether, as well as the effects of the system and application. The monitors are sensitive to the entire surrounding, and by fusing the measurements in the cloud and applying machine learning algorithms, we are able to provide deep data on the digital and analog performance, through impact on the digital environment. That’s how we approach the problem.”
During tests, such as in oven tests, having a monitor in the chip produces valuable data quickly. “Our analytics platform provides deep data on timing margins, so you can monitor the degradation over time. You can use this new data to predict how it will behave inside the oven, thanks to actual margin measurements. You can compare the expected versus what you actually saw from within the chip. Normally, you would have needed to run many chips in that oven and wait for it to end before you could understand which one was failing, and under which circumstances. But with Universal Chip Telemetry from the monitors, you can see the actual behavior. You can see what thresholds are coming, and you can find issues much earlier. It’s not only for predicting and analyzing the lifetime, but it also quite easily saves a lot of test time. You know already at a very early stage what the parameters will be and then set specific limits accordingly. You don’t need to wait until the end to understand what’s going on.”
Example challenges
Sensors add their own set of complexities. “A sensor may need some other stimulus,” said Andrei Berar, senior director of test business development at Amkor, during a recent presentation. “You may need to mix temperature with gas with pressure. So that makes our test life much more interesting.” Added complications of analog means test companies are working on solutions.
Testing the RF devices used in 5G means testing the signal strength and dealing with filters — and many of them — for carrier aggregation. FormFactor developed probe cards for difficult RF measurements of filters, says Daniel Bock, RF applications engineer at FormFactor.
Probe cards and probes have gone through innovations that make it possible to achieve high parallelism wafer probing. Formfactor took a machine-made spring and used a semiconductor wire bonder with a MEMS-made tip — an attachment to the spring. The configuration achieved high accuracy of placement of the probes and high reliability, an improvement over the old way of handcrafting the probe. When copper columns started to be used as interconnects, it meant FormFactor had to probe copper, which was new for them. “The requirements imposed on us meant we had to achieve very high current carrying capability (CCC) because these new chips were power hungry, we needed to achieve very low contact resistance, and we needed to lower the contact force because the number of probes was tripling,” said Jarek Kister, FormFactor’s CTO, in a video. “Because of accessibility to MEMS technology, we were able to create a new type of probe that utilized three different metals. Each metal’s function was very specific and different from the other one. With those three separate functions, we were able to create a new perfect probe that was able to do very high CCC, very low contact resistance, and very low bulk resistivity.”
Being focused on the ATE temperature is another route. “When it comes to test, what we are facing, and again the heterogenous devices, we get hot spots, and we get hot spots because the multiple chip devices on one side that have different thermal characteristics. And we also have spikes — spikes due to the way we apply the vectors during test,” said Amkor’s Berar. “Our challenge is primarily considering all these spikes and different devices, we need to keep our junction and the case temperature around 125°C to 150°C. This is primarily the challenge in test and one size does not fit all. We have different size, different power that needs to be managed.”
Conclusion
If analog circuits are the cause of many re-spins, the test industry will keep working the problem. The complexity and variety of chips will inspire more test innovation, and analog will also take advantage of the visibility added to digital chips.
— Ed Sperling contributed to this report.
Related stories
Preventing Chips From Burning Up During Test
Scaling, packaging and a greater push for reliability add new challenges for testing chips.
Digital Test Bulks Up – Or Down
Tackling growing chip size, rising test cost, and much more complexity.
Steep Spike For Chip Complexity And Unknowns
Increased interactions and customization drive up risk of re-spins or failures.
It would be a lot easier if there were fast AMS/RF simulators that could handle SoCs and systems.
Sitting in on standards committees (e.g. IEEE P2427), folks are still ignoring that digital is just an abstraction built on top of analog, and that’s before you get to 5G, and 3D ICs.
Testing everything is getting more difficult.
AI might fix that soon, to the detriment of those dependent on the current ecosystem.