Experts at the Table: One side is well-behaved and predictable, the other is not. How to ensure reliability across both is a looming challenge.
Semiconductor Engineering sat down to discuss 5G reliability with Anthony Lord, director of RF product marketing at FormFactor; Noam Brousard, system vice president at proteanTecs; Andre van de Geijn, business development manager at yieldHUB; and David Hall, head of semiconductor marketing at National Instruments. What follows are excerpts of that conversation. To view part one of this discussion, click here.
SE: Interference with signals is a serious issue with millimeter wave. If beamforming misses the target, is that considered a failure? And if so, where is that failure? Is it on the transmission side or on the receiving side, and how do you determine that?
Hall: That’s one of the hardest issues to solve, and it ends up being a network design challenge. Ultimately what you have to do is apply intelligence to switch to a band that’s known to be delivering excellent quality of service whenever you get into a situation where millimeter wave doesn’t deliver the performance because you have a tree in the way, or the beamforming isn’t precise enough to get the necessary signal-to-noise quality. We ought to look at beamforming as sort of a pressure relief valve on network capacity, which allows you to deliver extremely high throughput in the situations where you have an optimal connection. But the absolutely reliable communication is going to happen below 6GHz because the signal propagation characteristics are just so much easier. Millimeter wave is extremely important because it allows for dramatic increases in network capacity, but I don’t expect us to use millimeter wave to handle some of the mission-critical traffic.
SE: What’s different from a test perspective for 5G versus 4G?
Lord: Sub-6GHz testing is pretty much the same. There’s nothing new there nothing particularly challenging. So for the FR1 (4.1 to 7.125 GHz), it’s the same. For FR2 (24.25 to 52.6 GHz), the millimeter wave stuff is challenging, primarily on frequency. It’s up to 40GHz for the IC frequencies, which means you have to categorize the transistors to 80GHz. Most people are doing process characterization to 120GHz to understand the broadband performance of the transistors in each part of the components that are going into the IC. So there’s more investment in doing the high-frequency characterization, which is mainly S-parameters. You need to have a lot more knowledge of doing accurate calibrations for the probe tip so when you measure it, you’re measuring just the transistor or just inductor, and you’re taking out all the errors of the probes and the cables and the network analyzer itself. The other big challenge with 5G millimeter wave is having a good understanding of the device under non-50 ohm impedance for the real-world input and output impedance of the devices. To do that you have to match the impedance doing load pull on the input and output, or sometimes just on the output only. You change the impedance from 50 ohms to being something non-50 ohms. You actually change impedance to all the different points to measure the S-parameters, and then you can work out what is the optimal power efficiency or power output or gain of the device. Now you can tune the transistor in the circuit to have that optimal requirement, whether that’s power efficiency or maximum output power.
SE: Is that stressing the device to its limits?
Lord: It’s not so much stressing it as optimizing where the impedance is best suited to the device. You have to change impedance by using some sort of filtering circuit on the input and output. That way you can change it from 50 ohm impedance to something else, which is more optimized performance of the device. It’s more involved in terms of the measurement. We now have these big mechanical tuners on the input and output to the transistor to actually change that impedance. Millimeter wave makes it even more difficult.
SE: So that’s a way to optimize performance for these chips?
Lord: Yes, the device would behave differently under different impedances on the input and output. You might want to have the lowest noise, or maximum output power, or the most efficiency, and you’ll see different circles around the sniff chart where it behaves best for that particular parameter. Then you can decide, ‘Okay, I want the maximum output power, so I’m going to tune this to be at this point on the sniff on the output impedance, and that way I get the best power.
SE: Data is the other piece of this, right?
Hall: Yes, and a lot of the work that the OptimalPlus team has done is to take device failures and map those all the way back to the specific wafer that device was a part of. That gives us a lot more intelligence about identifying process failures, and potentially identifying other devices from the same lot that may be susceptible to similar failures. That’s added intelligence on the predictive failure side.
van de Geijn: Having traceability available is important, too, so you know exactly where your failing parts are coming from. You want to be able to map them into the wafer. It’s also important not just to calculate the real numbers, but to use offsets because cabling and other factors can influence your test. You can recalculate the test results based on those kinds of things. And then you can correlate between equipment with the same value or less, so if one system is behaving a little bit different from the other one you still can see how your final products are behaving.
Brousard: We’re providing deep data to correlate between the different stages and take in many data points. That data is sourced from the chip itself, using UCT (Universal Chip Telemetry), which means it’s measurement-based. It’s not circumstantial, where you have to correlate between different factors to come to a high likelihood conclusion. It fits into the scheme of things of collecting different data sets, but the data that we provide is directly mappable to the state of the circuits. The fact that it’s transmitted from the chip also allows us to provide one common ‘language’ throughout the lifecycle, so when you read out our Agents from the chip at system-level testing, it’s the same data you’re monitoring at chip-level test, so you can easily correlate back and forth. The practical implications can be many. For instance, if you’re correlating the behavior of these data points at the chip level, but they consistently transition to a certain offset when you’re at the system level, then you may have an escape or an anomaly where a part does not behave like the rest of the population. That’s a convenient way to identify outliers when transitioning between stages or systems. It’s really an anchor that lays the groundwork when collecting many data points for these correlations.
SE: Obviously, we’re going to need to understand where this stuff came from. What’s necessary to manage the supply chain to make sure we understand where everything came from, that it works as expected, and that it doesn’t do something it’s not supposed to do?
Hall: Part of what we’re doing is using metadata tagging so that every time a device is tested, whether that’s in an end device, or the wafer level or the package part level, we’re able to then compare the data and correlate it. So you can take the data from a field failure and trace it back. There’s a lot of data analytics and machine learning and AI technology involved to stitch all that together. That’s part of the solution.
Lord: It depends on whether you’re talking about IDMs, which make their own devices, or whether they use foundries. The foundries add value and they can charge more when they have really good known and trusted processes. But customers still don’t entirely trust their processes and their models. They do their own reliability testing, and even their own device characterization, to make sure they can match up the results they get from the foundry to what they measure when they get the device in house. So people are taking steps to qualify what the founders are telling them in terms of their processes and what they can achieve.
Hall: That’s a really important point. You can guarantee a chip is reliable if you spend enough time and money, but there’s a big question and how willing our industry is to do that based on the reliability requirements. There are some discussions happening right now for chipmakers doing millimeter-wave 5G production test as to whether or not they’re actually going to test the RF through parametrics, whether to test only at the wafer level, whether to test package parts in an over-the-air situation — which is far more expensive — or whether you just put it into a sample test type scenario and only test a handful of scenarios. I don’t think we’ve figured out as an industry what kind or level of test is acceptable, but we could see issues because it’s expensive to ensure that millimeter wave devices are reliable.
Lord: And if you don’t make sure they’re reliable, and they go in a package, that causes other problems because the packages are expensive, and so are the systems. Is it all cost-effective to test 100% on-wafer, or just through DTCO (design technology co-optimization) test and then hope it works in the package. We need more information on the yield of this the chips before we can make a decision.
van de Geijn: It also depends the application. You want to have higher reliability if it’s not so easy to replace.
Brousard: One of the benefits of UCT, meaning that you are reading out data from the chip itself, is the ability to correlate what you expected from the models to the de facto or parametric results. If you have UCT embedded in each chip you can practically correlate the expected performance from the models from emulating the pre-silicon models to the post-silicon results —and you have actionable insight to go back and see where the gaps are.
SE: We’ve got different markets and different locations that these chips will be used in, and it’s likely they won’t all be added in at the same time. Repeaters on walls that are exposed to heat and cold will be replaced at different times. So do we have to worry about reliability of each element, or do we have to think of the reliability of all of these components in aggregate as a complete system?
Hall: I can foresee a situation, particularly in millimeter wave, where we’re not going to get the system-level performance because of other factors that are unrelated to the chip design itself. It seems like there’s a tremendous amount of value in being able to correlate device behavior, and device-to-device behavior, to help identify if it’s a device failure or if it’s a network failure.
Brousard: Part of the reason for embedding UCT into chips is not limited to understanding how the chip itself is behaving, but also how the chip as part of a system. You need to know what is happening on the system level and how that affects the chip and vice versa. These miniature monitors sense how the system affects the chip, be it voltage, temperature, or workload. If you have two systems at the same temperature, same deployment, same batch, but one of them is being significantly overstressed 24/7 while the other is off most of the time, relevant parametric readouts will be indicative of the rate of failure, which will be highly correlated to the usage of the of the chip itself. We try to look at this at the system level. We use the chip as a “system-sensor”, and we see the effects of the environmental conditions or real-world applications on it.
Related
The Quest To Make 5G Systems Reliable (Roundtable part 1)
Experts at the Table: Where are the challenges and what can be done to solve them.
Huawei: 5G Is About Capacity, Not Speed
One-on-one with CTO Paul Scanlan.
5G Brings New Testing Challenges
Millimeter-wave and beamforming capabilities present the biggest testing challenges.
5G Knowledge Center
Leave a Reply