Error rates rising because design complexity now requires persistent expertise updates.
Rising complexity in developing chips at advanced nodes, and an almost perpetual barrage of new engineering challenges at each new node, are making it more difficult for everyone involved to maintain consistent skill levels across a growing number of interrelated technologies.
The result is that engineers are being forced to specialize, but when they work with other engineers with different specialties they frequently don’t understand where the gaps are. Not everyone is speaking the same language—sometimes literally—and the skills at one process node may be markedly different from another. That allows errors to creep in at every level, increasing the number of re-spins and overall costs, decreasing yield, and stretching out time to market.
Semiconductor Engineering conducted more than 20 interviews over the past three months involving all sides of the semiconductor ecosystem. Many people interviewed did not want to talk for attribution because yield and error rates, as well as the causes of those errors, are considered competitive information. But there is almost universal agreement that for each new node, the ability to share knowledge is becoming more problematic at a time when it also is becoming more essential.
Skills transfer always has been a headache for companies. In the past, though, it generally has been a matter of requiring refresher courses for engineers and scientists, as well as briefings about what is new or changing. Below 28nm, this has turned into a much more serious issue for nearly every segment of the semiconductor supply chain.
“The toughest challenge is increasing complexity,” said Jim Jozwiak, workforce development engineering supervisor at Micron. “Ten years ago, most engineers could be trained in a five-hour [update/refresher] class. Now it’s 10-plus hours, and months later the content is obsolete and irrelevant. That makes it harder to provide training because the expertise lies in a large number of engineers. Very few people have all the expertise. So you need to tap into dozens of people, and then disseminate that knowledge. But to pull people away for 10 to 12 hours, and then have that information obsolete a year later, is not practical. It also makes it harder to keep training documents current because they are subject to perpetual revision.”
The big picture
Skills transfer spans every facet of the semiconductor supply chain, from design through manufacturing, but the problems are worse at each new node. There has been much discussion about the need to cross-train embedded software and hardware teams, as well as analog and digital engineers. Some of that has been automated, some is managed inside of big chipmakers through multidisciplinary team leaders who understand the challenges faced by more than one team.
But at advanced nodes this is becoming much more difficult because the next node is not just another shrink. The technology is changing significantly. A chip developed at 28nm is far different from one developed at 16/14nm. But one developed at 10/7nm also may be far different from one developed at 16/14nm, even though they both use finFETs and some form of multi-patterning. There are new materials, new processes, different lithography challenges, as well as a required shift in tools and materials.
“The question becomes how you manage the process and ensure quality, because in the middle of this are human beings,” said Selim Nahas, technical marketing manager for automated software solutions at Applied Materials. “A fab will drive quality initiatives, but then they wonder how they take a beating at every weekly review. It’s because it’s hard to get a handle on all the pieces. There is an enormous amount of data, but it’s all disparate. And if you look at fault detection, the SPC (statistical process control) tool takes measurements on a production wafer, then you have electrical tests, but every one of these systems is different.”
Nahas said the assumption is that people share the same knowledge across systems and across data sets, but frequently that turns out not to be the case. “The implication is that on 5nm and 3nm, you can’t do it with what we have today. There already is an ambiguity of the source at 28nm and below, and that’s very significant. Tool matching, fault detection and in-line SPC are all different. And every time you experience an event, that can be propagated across more material, so the damage can be greater.”
There is a real dollar cost to this, as well. (See fig. 2, below.)
New problems at advanced nodes
On top of these issues, the training of teams on the manufacturing side is becoming increasingly node-specific. So rather than trying to retrain teams working at 16/14nm, foundries are replacing them with entirely different teams at the next nodes. This makes sense from the perspective that it reduces downtime, but it makes it much harder for engineers trained on different processes to communicate. As a result, when they are thrown together at some future node they may have assumptions about what others know, but in reality they have missed the transition step.
This is one of the reasons that large chipmakers have developed chips at every process node, even though they don’t necessarily go to market with those chips. The knowledge is so different at each node that it’s important to at least understand the changes. But on the foundry side, that kind of training is harder to justify because it can slow down throughput and raise costs in a highly competitive segment of the market.
Prior to 28nm, this was less of an issue because processes were not that dissimilar from previous nodes. Concerns at the design level were largely about power, electromigration and floor planning, and at the manufacturing level it was about optimizing processes for power and performance. At 20nm and below, lithography hit a wall, inspection began running into problems, double patterning was introduced, finFETs were added to control current leakage, and dynamic power became a problem. At 10/7nm, quadruple patterning will be required for some layers and leakage current returns after a brief respite at 16/14nm. There also are new techniques such as air gaps, and new materials such as ruthenium and cobalt on the horizon. Even the basic transistor structure is being rethought, although exactly when it will change over to gate-all-around FETs is still being debated.
“The 7nm development team is entirely new people who didn’t learn the lessons from 16/14nm,” said David Fried, chief technology officer at Coventor. “The problem is that now, with every new generation of technology, there are 50 earth-shattering innovations. So with big-ticket items, those are all going 3D. And it goes all the way down. When we were moving from 0.25 microns to 0.18, there were a handful of changes. Now there are nearly an infinite number, but if you don’t keep track of them you can lose a product on the tester.”
It’s unclear at this point whether industry consolidation will help or hurt this trend. The number of companies moving to 7nm, for example, is lower than the number of companies that moved to 28nm. There are a couple reasons for this. First, there are simply fewer companies designing chips. And second, more companies are waiting to figure out if they want to shrink features or use an advanced packaging scheme such as a fan-outs or 2.5D, a different material such as FD-SOI, or some combination of two or more of those.
“There are four companies working on 7nm and the teams are massive,” said Fried. “The vast majority are just doing one node at a time, though, so the lessons learned at 10nm don’t necessarily get transferred. Training is more complex. With new hires, you need to show, ‘Here’s what we do.’ That’s one of the first things we’re pulled in for—an application to make that happen.”
Things will get even more complicated at 5nm, when quantum effects begin entering into the picture. Rather than just dealing with electrical issues, design teams and foundries will have to start wrestling with quantum considerations. Electrons behave differently enough at that node that it becomes a noticeable problem, according to early reports from a number of companies.
Dealing with the magnitude and rapidity of these changes requires a fundamental mindset shift inside of companies, though.
“Production operators are trained to follow a sequence of events over and over,” said Michael Ford, senior marketing development manager in Mentor Graphics’ Valor Division. “At one company, if you could pass a test they gave you, you actually didn’t get a job because they wanted a certain mindset. The more modern approach is that operators make decisions that are more flexible. But to do that you need higher-level operators, and engineers tend to be more specific in their skills, so that flexibility goes away again.”
The solution, at least for the time being, is to automate more operations. But that also leads to displacement of operators at the bottom, and a need for much greater training for those that remain. And even with that training, differences emerge, particularly as complexity increases.
“What we’re looking at is human error and human variability,” said Applied’s Nahas. “With variability, the question is how much depends on differences from one person to the next. In a fab, you have very distinct fingerprints, which allows you to determine the reason why people are behaving a certain way. But we’ve found that if you address this, within three months you will see a 5% decrease in variability. That means building a mentoring environment and working through specific problems. A good starting point is your suppliers’ quality management. If you give a fab a wafer, how do you know it’s good? And how good is it?”
He said this becomes particularly important in advanced packaging, where the discussion needs to go well beyond known good die. It needs to span companies, a problem that also crops up with M&A. “If you’re going to marry different companies together, you have huge risks if you don’t establish standards like this. The quality side may be lacking. One bad event can kill you, and your ability to pick up additional customers can disappear in one quarter.”
These quality issues began surfacing in discussions about advanced packaging several years ago, particularly with heterogeneous integration. The focus at the time was on what happens when two good die are put together and the combined system no longer works. But the problem is broader than that. It encompasses people, methodologies and unknowns, which crop up in every design.
“This is a very important problem in terms of skills,” said William Chen, senior technical advisor and fellow at ASE. “When we first started fan-outs, there was a transfer of technology from another company. They declared they had the process ready to go to manufacturing and that we would do the production. It turned out to be a painful year and two months. It was a readiness problem. That was one piece of the problem. The other was trying to understand what is the process. Some people think, ‘This is the way to do it.’ Others have to learn how to do it. A lot of us think about process as empiricism, not as a discipline.”
Chen noted that much of what is required is not being taught in universities. “It’s about understanding the materials, and the physics and chemistry behind the materials. The key is that you need to pay a lot more attention and be more tolerant because this is not just about process parameters. If something is manufactured by one company and the tools are manufactured by another, how do you transfer that?”
“There are a lot of cross dependencies and learning curves,” said Mike Gianfagna, vice president of marketing at eSilicon. “This is a new business that is rife for standardization, but there is no standard specification for a 3D stack. We need an interchangeable specification. You also need a general contractor on the job who can direct subcontractors to hit a target and take responsibility for integration. Whoever steps up to that needs to have technical depth and confidence in what they’re creating. So at least for the initial version of a design, you really need subject matter experts. You need the best people for DFT, signal integrity and power integrity. And when that goes from the cutting edge to mainstream, you need a way to transfer those skills that is repeatable and reliable. You can’t have the best experts on every chip. You need to develop training, and that requires a good, sophisticated expert system so that others can take advantage of expertise without picking up the phone.”
It also requires some really good teachers, because not all of this can be automated.
“The first challenge is to find the right people, and for that type of role it’s getting harder to find the right combination of skills,” said Micron’s Jozwiak. “You’re taking engineering concepts and teaching them to a broad audience, from Ph.D.’s to associate degrees. Then you need very solid business rules, which are not a 100% current resource. You need to update that quarterly, or at least every six months, and you have to be honest about that in training sessions. Many details will change over time. Node to node is a challenge, but so is within the same node—especially in R&D, where the node can change significantly over time.”
Measuring results isn’t simple, either. There are no clear numerical values assigned for this kind of training. It plays out over time. If companies are successful, yield and quality are improved. But how quickly that can be discerned, and against what metrics, will never be clear. And that makes selling these concepts into organizations extremely difficult, particularly if there isn’t clear support from the very top of the organization. Yield and quality are measurable, and achieving both are obvious goals. But how you achieve perceptible improvements in both is a lot harder than it looks.
Masters Of Abstraction
What makes one system designer better than another?
Can Analog And Digital Get Along Better?
Combining both in a mixed-signal design brings challenges in a different realm. Expertise is the key to success.
The Making Of A System Architect
It’s time for engineering schools to revamp their curricula to address real-world problems.
Faster Time To Yield
Coventor’s CEO talks about how to get chips through manufacturing more quickly.
Moving From Engineering To Management Or Staying On The Technical Track
The stigma of sticking to your technology roots is gone. So what’s your next move?