More and better screening of diced dies is essential to meet the quality and cost goals of the 2.5D/3D-IC era.
The move to multi-die packaging is driving chipmakers to develop more cost-effective ways to ensure only known-good die are integrated into packages, because the price of failure is significantly higher than with a single die.
Better methods for inspecting and testing these devices are already starting to roll out. High-throughput infrared inspection is capable of catching more sub-surface defects that can cause device failure. And active temperature control, once only possible at final test, now can be implemented prior to assembly to improve yield and reliability earlier in the manufacturing process. These approaches address the manufacturing realities that come with singulated die, multi-die product cost and performance targets, and a quality target of 10 defective parts per million (DPPM).
Underlying this activity is an insatiable demand for more compute horsepower and more functionality, largely driven by AI and high-performance computing applications. And the best way to meet that performance demand is to increase the number of processing elements and memories, which no longer fit in a reticle-sized SoC.
In doing so, dies need to be thinned to reduce the distance that signals need to travel and the amount of power needed to drive those signals. But that requires backside grinding of a 300mm wafer of 775mm down to 100mm or less, which makes die more vulnerable to cracking during die singulation (a.k.a. wafer dicing) [1,2] and the handling processes that follow.
Die cracks at the dicing site, on the die surface, below the surface, or on the die backside can propagate during assembly steps including during thermal cycles used during package test, package-level burn-in and field use. Often these cracks result in product failures. To make matters worse, mechanical stresses to a die may be small enough to pass inspection techniques, yet still cause a measurable change in electrical behavior.
“Throughout my career, we’ve seen different problems after die singulation,” said Nitza Basoco, technology and strategist for SoC at Teradyne. “One time we witnessed a sudden percentage shift in the device’s PLL (phase-locked loop) performance from wafer sort to final test. This made zero sense to us. Eventually, we found it was due to an issue when pushing die off of the blue tape (wafer-sized film). Instead of four prongs pushing the die, there were three prongs. At the corner of the missing prong the die dipped, and sometimes it hit the next die over it. So in that case, two dies could become problematic.”
Fig. 1: Manufacturing process steps from tested wafer to assembly process. Source: A. Meixner/Semiconductor Engineering
Prior to assembly, screening of singulated die can occur in two places – inspection and electrical test. These steps also provide engineering teams with the data from the associated manufacturing steps to determine actions for improving process control, yield, and reliability.
Physical inspection of singulated die has been necessary for more than two decades. But the introduction of 2.5D and 3D assembly processes prompted research into way to improve physical inspection by using alternative wavelengths and supporting 100% sampling.
In addition, singulated die testing while the die are still on the dicing frame has been performed for at least 15 years. But this approach has limited thermal control when testing high power die such as a GPU. As reported at conferences from 2016 to 2024 [3,4,5,6], two engineering teams developed handling and active thermal control (ATC) solutions specifically for testing AI/HPC devices.
“Picking up the singulated die, especially a thinned device, requires a carefully engineered tool,” said Dave Armstrong, an independent consultant. “Holding the device to the chuck solidly requires a high level of vacuum and a carefully engineering chuck surface. The engineering of the thermal system needs to be done in a way that hot-spots from one area of the chip do not impact the temperature in another area of the chip.”
Others concur about the handling and thermal challenges for bare die test. “Intel has been in production with singulated die testing for 10 years, so many of the initial alignment issues are now well understood, with robust visual alignment mechanisms between the die pick-and-place and the probe head,” said Mark Gardner, vice president and general manager of packaging and test business group at Intel Foundry. “Similar to wafer sort, there are considerations for maximum temperature to prevent bump deformation and bump melting, which has also been a challenge early in the singulated die testing journey through tight temperature control.”
Die inspection
Finding cracks and scratches on a singulated die requires inspection tools that cover a full spectrum of wavelengths, in addition to highly accurate computer vision algorithms with throughput times that allow for 100% sampling.
Optical inspection has been the workhorse for detecting die chips and cracks, and it can support 100% sampling due to its high throughput. For such automated optical inspection methods, the defect detection has been on order of 100 DPPM. [7] But optical has difficult discerning micro-cracks, according to multiple industry, and needs to be supplemented with other inspection methods.
“In addition to visual inspection of sawn die edges, most OSATs have methods for inspecting (eligible) die with techniques like infrared (IR) imaging or X-ray to check for sub-surface damage,” said Rick Reed, senior director of advanced product technology integration at Amkor Technology. “Sub-surface cracking could lead to failures later in the assembly flow or in the field.”
Infrared wavelengths are more capable, but until recently such equipment was too slow to allow for 100% sampling.
“Cracks located beneath the silicon surface or on the backside of the wafer are invisible to normal visible light,” said Nathan Peng, senior product marketing manager for Onto Innovation. “One solution to this problem is infrared imaging, which can detect cracks not visible with traditional methods. Although productivity has been a bottleneck for many years, IR imaging has typically been used for review purposes. Our new IR inspection technology performs crack detection right after the die-sawing process. With advanced IR design and algorithms, this technology overcomes productivity issues and is applicable to high volume manufacturing environments with 100% sampling to ensure die-level yield improvement. Additionally, supporting modules like the film frame inverter and non-contact chuck enable IR inspection from the backside of the wafer for backside crack inspection.”
At the same time, methods for optical die-edge inspection continue to evolve. They are proving useful for identifying damage caused by the pick-and-place mechanical process. At a 2022 conference, Besi engineers shared experimental results from adding an optical module to a pick-and-place tool.[8] With 6µm/pixel and 6-sided looking optics, they achieved 5,000 units per hour throughput. Their empirical study specifically looked at detecting defects following four different dicing techniques — single cut, step cut, laser full ablation and laser stealth ablation.
Electrical testing
A traditional test and assembly manufacturing flow proceeds as follows:
As always, increasing packaged part yield and reducing overall test costs have prompted engineering teams to innovate on test solutions for singulated die prior to the assembly process. This requires attention to the mechanical sensitivity of probing force on thinned die, both in a wafer frame or bare die. In the past 10 years semiconductor manufacturers of HPC products have developed probing and thermal solutions for bare die. These enable new test sequences that have the potential to affect other industry sectors.
Probing and alignment
To successfully perform singulated die testing, engineers need to consider both probe alignment and the mechanical force exerted on the die. Both are necessary to ensure a low-contact-resistance connection.
Die alignment requires a visual alignment system with sufficient precision to align probe tips to bumps. As bump sizes decrease, the demands on this visual system naturally increase, and probing becomes much more challenging. But wafer-level tests for computing products rely on DFT-based methods to support lower pad/bump testing, which can help mitigate the probing challenges through careful device signal routing to the pad/bump design.
Fig. 2: Bump or pad pitch ranges. Source: A. Meixner/Semiconductor Engineering
Performing wafer testing only after singulation has been an option for wafer level package products for the past 15 years. This represents a shift right in the test and assembly process flow, which permits electrical test to detect post-dicing defects that could escape inspection.
Fig. 3: Wafer level packaging process flow showing shift right of wafer testing. Source: A. Meixner/Semiconductor Engineering (after Ref. 9).
“The singulated die is placed on the wafer frame, which is the wafer form’s metal frame with blue tape,” said Wataru Hoshikawa, group leader of test cell solutions at Advantest. “Singulated die test is used with the same probe card as wafer test. Therefore, the probe challenges are the same as the challenges encountered during wafer test.”
However, multi-site testing with the standard wafer frame and tape has probing challenges, as described by an ASE engineering team in their ECTC 2015 paper. [9] Until recently, the best-known method was to reduce the number of sites. But that results in additional development costs and a reduction in parallel test cost benefits.
“The die position is unpredictable due to the tension released after wafer singulation on the film-frame,” wrote ASE engineers. “The die position gets changed but the prober card layout is fixed, so the pin is hard to contact with the bump properly in testing process. Due to the improper contact condition, the test result should be poorer than chip probing.”[9]
To control the die position for multi-site testing, the ASE engineering team switched to mounting the wafer on a glass carrier with double-sided adhesive tape. This enabled preserving the multi-site test solution used at wafer test. Furthermore, the adhesive tape has thermal release characteristics that eliminate the needle ejection step used in pick-and-place processes, reducing the mechanical stress on thinned die that can cause die cracks.
A decade ago, both Intel and IBM engineering teams developed singulated die test solutions for high power ICs destined for 2.5D and 3D packages. In contrast to wafer singulated die testing, these solutions involve mounting the bare die on a thermal chuck. In general, the same probe technology can be used, but for bare die on a chuck the probe force on microbumps needs to be carefully considered. Die planarity is a factor in probing. Another factor is applying sufficient force for low contact resistance without damaging the thinned individual die.
“Our experiments used a rigid probe system requiring high contact force. It is advised that a more compliant probe system be used for die level test on thin die. This will be even more important when probing smaller pads and microbumps,” wrote IBM and Advantest engineers.[3] “Less compliance in the smaller solder bumps will require a compliant probe with reduced probe force to still maintain full contact. This is required during expansion and contraction from power fluctuations during test and during the rapid thermal transitions. This was the primary cause of the contact resistance variations and outliers we observed during singulated die testing. A more compliant low-force probe will also reduce the force required to maintain full contact.”
Thermals
When testing singulated die on a wafer frame, the thermal solution remains wafer-based, i.e., passive thermal control. When an individual die is tested, the smaller thermal mass enables using active thermal control (ATC), which previously has only been possible at final test.
Such is the case with recent developments for high-performance computing chips as documented in conference presentations by Advantest and IBM engineers [3,4] and Intel engineers. [5,6] In both technology descriptions a die is placed directly on a chuck, which contributes mechanical stability during probing and facilitates ATC solutions with direct contact to the backside of the die.
Fig. 4: Conceptual diagram depicting the differences in wafer and die test thermal solutions. Source: Intel Foundry
For their heating and cooling technology of the die chuck, Advantest and IBM engineers described a dual liquid system. [4]
Fig. 5: The left diagram showing the use of thermally controlled liquid for ATC; the image on the right shows a die on a chuck. Source: Advantest
The thermal management of a significantly smaller chuck-to-DUT interface enables rapid adjustment of the chuck temperature. Comparing wafer and singulated die thermal solutions, Intel engineers shared their data on the temperature response as a result of die power increases, showing a significant increase in thermal control. [6] This capability subsequently reduces test execution time for a single temperature test. Moving to singulated die with active thermal control results in each die being tested at a consistent on-die temperature.
Fig. 6: Comparison of temperature response between wafer passive thermal control and die active thermal control solutions. Source: Intel Foundry
Solving thermal management needs enables engineering teams to shift left test content from final test to singulated die test prior to final assembly.
“Due to this challenge (controlling thermal conditions on a wafer), we see a trend of customers moving high heat test generation test items from wafer test to final test,” said Advantest’s Hoshikawa. “Singulated die test can help with the thermal management, as the die form is independent, so there is no need to be concerned about thermal leaks to the next untested die.”
In addition, the thermal control can be optimized according to a die’s power-frequency performance capability. With the wafer-based thermal solutions, higher-power die run hotter than lower-power die on the same wafer. At the 2024 SW Test Conference, Intel engineers shared empirical data comparing wafer sort temperature of a single die against four die with different peak test power levels. [5]
Fig. 7: Temperature response for four unique die, comparing passive vs. active control thermal solutions. Source: Intel Foundry
New test flows enabled
Die-level test coverage remains a function of die test pad or bump layout and device DFT, so actual test coverage at a singulated die insertion is beholden to these constraints. This remains true for singulated die on a wafer frame, bare die, and final test of packaged devices. The ATC test solution for bare provides test engineering teams several benefits.
“ATC capability can enable additional stress testing and test conditions more closely resembling package testing for higher voltage and frequency corner testing,” said Intel’s Gardner. “This can boost good die into packaging, which in turn can improve package test yields. Singulated die testing’s ability to run higher frequency testing can also enable binning and performance segregation before packaging. This, in turn, can enable better die pairing of chiplets during the assembly process, which can be an additional benefit to advanced packaging products.”
For industry sectors that require guaranteed operation over a large range of temperatures — automotive, aerospace, or any mission-critical system — the test cost could decrease. Three test insertions, each with a unique temperature, become one test insertion when the test set is applied at three unique temperatures.
With rapid ATC response, a test engineering team can consider new test sequences that swing temperature and supply voltages during a single test insertion. Such sequencing could benefit first silicon characterization and validation activities.
“From a test perspective, you now can create a sequence of events from a thermal aspect and shock the device,” said Teradyne’s Basoco. “You can put the device into the conditions in which it would be at the end product, because sometimes it’s not just a certain temperature that causes a problem. Sometimes it’s when the devices goes from ‘this temperature’ to ‘this temperature’ that you have a problem.”
With ATC for singulated die, engineers can develop new test method scenarios to implement for hard to detect time-zero and latent defects. Such is the case for the silent data corruption failures, which continue to plague hyperscalers. A number industry experts have said these failures manifest due to a combination of circuit activity and local thermal and voltage environments occurring during that circuit’s activity. [10]
Conclusion
The dicing process can cause die cracks that can propagate to active die areas, impacting packaged product yield at time zero and reliability failures in customers’ systems. Screening for these defects always has mattered, but it has become even more critical with the advent of 2.5D and 3D assemblies. As a result, inspection and pick-and-place OEMs have improved optical and IR detection methods to meet the throughput and detection accuracy required for 100% sampling.
To reduce overall test costs and improve KGD, quality electrical testing after die singulation has been used by OSATs and IDMs. However, these wafer-based test solutions lack tight thermal control. Hence, in the quest for KGD for HPC applications, engineering teams developed active thermal control solutions for a bare die. With such capability, the full final test suite can be run at the die level for these high-power devices. These ATC solutions can rapidly change temperature settings. Consequently, prior to assembly into 2.5D and 3D packages, new test sequences can be created to further screen out defective die and match die performance attributes, such as frequency and power.
This begs the question of where might ATC singulated die testing be applied next?
“Low DPPM devices and high performance of any kind might benefit,” said Teradyne’s Basoco. “But it really is dependent on the type of performance, as well as reliability, that the company is delivering. In the example of the PLL performance shifting, that could have been on any device. It’s how sensitive the device is to a performance shift. If you want only 1 DPPM, then you might want to do this.”
References
Related Stories
Leave a Reply