More Sigmas In Auto Chips

Increasing robustness levels will require significant changes throughout the entire automotive ecosystem.

popularity

The journey to autonomous cars is forcing fundamental changes in the way chips are designed, tested and tracked, from the overall system functionality to the IP that goes into those systems.

This includes everything from new requirements for automotive-grade chips to longer mean time between failures. But it also makes it far more challenging, time-consuming and complicated to create these devices, because they have to be designed in the context of other systems. Some of those systems are still deep in the development phase.

The new reliability goal circulating in the automotive market is seven sigma. While that may be mathematically possible, it’s certainly not clear how realistic that goal is. Six sigma, long considered the gold standard in manufacturing, is the equivalent of 3.44 defects per million. But those numbers also need to be viewed in the context of complex systems over extended periods of time. At the very least, car markers are looking to substantially improve on existing failure rates as they scramble to develop vehicles that can drive themselves. Assisted and autonomous driving add new liability concerns into the automotive world, and part failures are preventable—at least on paper—with better simulation, verification, prototyping and testing.

“In the semiconductor business we develop products that are robust, meaning we have a specification and we have different levels of quality, which is usually measured in sigma,” explained Will Chu, vice president and general manager of the automotive business unit at Marvell. “This all comes back to statistics—3, 4, 5, 6 sigma. They are the targets, depending on the industry. The higher the level of sigma, the more confidence you have that you’re meeting your specifications under your specific requirements. In automotive, this translates to robustness. And specifically for autonomous driving, it’s a statistical representation.”

In the past year, carmakers have told suppliers they expect zero defects for up to 15 years.

“You want your autonomous vehicle to work under very adverse conditions,” said Chu. “With that in mind, if you tie that to the different levels of autonomy, you could see it as a matrix and try to figure out what the target should be. As such, the automotive semiconductor company has a decision point about how it wants to architect the system, and if it is some level it wants to claim as its robustness level.”

Achieving this level of robustness becomes particularly critical for autonomous vehicles because an estimated 4 terabytes of data will flow in, out, around, and through an autonomous vehicle, he said.

But designing the electronics to handle the extreme temperatures and stresses of these vehicles is a daunting task, and one for which there is only limited precedent.

“The EDA industry has for many decades created great solutions to resolve very complicated design routing algorithms,” observed Burkhard Huhnke, vice president of automotive strategy at Synopsys. “If you look into the aging factor of modern SoCs, and then into automotive grade requirements, we are talking about ASIL as a safety integrity standard in automotive, and ISO 26262, where you define functional-safety relevant aspects in the design of your automotive systems.”

Huhnke, who previously worked at Volkswagen, said 50 billion semiconductors were delivered to the automaker for the 2014 model year—a number that is growing rapidly as more electronics are added into cars.

“This is a very high number, and what I realized when I was in charge of the integration side of software and hardware was that we had field problems caused by semiconductors because the wafer didn’t work properly, and so on,” he said. “The question was what is automotive grade in the semiconductor world? How do we design that correctly so it will last for 15 years, because that’s the average lifetime of a car? You need 3+ years in development time and then 7 years production time and in the worst case, 8 years in the market, so 7 + 8 in the worst case. Now you have a 15-year-old system in the car, and the electronics are supposed to work properly. Applying all of this to autonomous driving cars, now we’re talking about liability, and that puts a lot of pressure on to the entire ecosystem of OEMs. What is available and what can be provided by the semiconductor world?”

Built-in, self-testing algorithms can help. “An error-correction mechanism is already established. Even the safety integer redundancy with a dual-core lockstep processor solution is being provided. I was surprised that we have the redundancy already implemented on the SoC level. These are in the most sophisticated SoCs, but it’s available. How can you apply that throughout the entire ecosystem that’s available in the car side? If I would have looked from the OEM standpoint, I would have built up redundancy through a second computer. If I use the SoC and the semiconductor opportunities by talking with them directly, I can reduce the cost dramatically.”

This depth of understanding about solutions from the semiconductor world is relatively new in the automotive sector. “Car manufacturers are outdated, because when they begin to design a car they look for availability of system components usually five years before a vehicle is launched. So they pick what’s available on the shelf, take the processor which is available at that time, maybe something which is used for a while under the automotive grade, as well as being very reliable and robust. This means they are picking five years before the launch of the product, and maybe an outdated processor from the innovation standpoint, but a very reliable and robust system. If you want to improve, and the pressure is from all of the automotive OEMs, more computing power is needed, along with more memory, more connectivity, and more intelligence from more sophisticated computer power,” Huhnke pointed out.

These changes already have begun showing up in automotive design. In the case of Audi, its developers were able to distinguish hardware and software development by using a virtual prototype of the hardware to begin with the software development. But designing high-sigma automotive designs is happening on a scale that can be staggering to comprehend. For example, an Audi A7 has more than twice the number of lines of software code compared to an airplane.

“The question is who is going to be able to deal with that complexity,” Huhnke said. “The car manufacturers are not experts on software. If one automotive OEM makes up to 10 million cars a year in 100 factories around the world, this requires a huge shift in their organizational structure. Plus, it requires standardization because nobody can deal with the exponential growth rate of the number of software lines of code

This means software must be standardized, the operating system must be standardized, platforms have to be created, software IP must be reused — this is a big game changer that will demand that thousands of software engineers be hired to make sure the OEM is ready for the future. A part of this will be the entire redesign of the automotive electronics architecture.

Higher complexity
The complexity only grows from there. While electostatic effects such as noise, electromigration, and on-chip variation could be ignored at older nodes, carmakers are looking at the most advanced technology nodes for AI chips—the central brains of an autonomous vehicle. This is partly because the design cycle is so long, but it’s also partly because those systems will require maximum performance per unit of area and per watt.

“High-sigma sampling is tricky,” said Christoph Sohrmann, a member of the Advanced Physical Verification group in Fraunhofer’s Division of Engineering of Adaptive Systems (EAS). “A lot of new methods have been published trying to improve the efficiency of sampling from high-sigma distributions. Less attention is usually paid to the correct handling of parameter correlations. In real statistical data, you find complex spatial dependencies between all parameters. To reduce this complexity, the notion of local and global variation has been introduced, meaning uncorrelated or fully-correlated, respectively. In the near future we may see that this model overestimates or underestimates the variations at the system-level. It is not sufficient to account for the variations accurately, and more advanced correlation models are required. This will bring about another set of questions to be addressed, such as how to measure those statistical models, and from the design perspective, how to sample those highly-correlated, high-sigma distributions.”

Traceability adds another level of complexity. “If all of the above has been taken care of during the design, field failure may be related to a temporary manufacturing outlier,” Sohrmann said. “The new regulations may require the OEM to recall an entire batch of affected products. This immediately poses the traceability issue: Where, when and under which conditions has the semiconductor been manufactured? Where are the other products from this batch? How do you optimize the recall? One can imagine new challenges regarding the bookkeeping for each of the semiconductor parts inside the final product.”

That bookkeeping may still be evolving, given that automakers are trying to add as many features as they can.

“It’s always a question of how much you’re going to be able to innovate to know how much you’re willing to risk,” said Ranjit Adhikary, vice president of marketing at ClioSoft. “European and American automakers try to put in more and more features, all of which has to be robustly tested. There are various new technologies in combination, which varies between different models of cars, and keeping track of all of the information is going to be very challenging.”

This is a data management problem, and that data needs to be accessible at least across a company, and in some cases across the entire supply chain. That requires restricting who can see what and setting up permissions and authentication, but it also requires an understanding of how all of these pieces go together. That used to be done on a spreadsheet, but the data management problem has grown far beyond what a spreadsheet can handle in an autonomous vehicle.

“When you’re dealing with a lot of customers, you want all the information, the knowledge base, everything in one place,” Adhikary said.

Better verification
Verifying a high-sigma level requires lots and lots of simulations and Monte Carlo runs, all of which are very expensive and time-consuming.

High-sigma verification is especially critical in 1) automotive on the safety, reliability side of things; 2) mobile chips, and designing down to the most advanced nodes; 3) lower power designs, where higher sigma becomes challenging; 4) IoT-type applications, and making sure the chips are working and reliable in the field; and 5) high-performance computing where power and the advanced nodes are very important, explained Amit Gupta, general manager of the Solido group in the IC Verification Solutions Division at Mentor, A Siemens Business. He has observed that high-sigma pain points are top-level challenges for users.

Memory design is particularly prone to the pressures of high-sigma issues. “From bit cells to control logic, the whole array for the memory is very important to achieve higher-sigma type defect rates to prevent failures,” Gupta said. “This also is evolving beyond memory into standard-cell library design where when you run all of the flip-flop types or inverter designs, such as your entire simple and complex library of standard cells, to high sigma. This is very important, garnering a lot of usage from foundries and IP providers, along with fabless semiconductor companies. And then there’s analog, which is very important, too, because the traditional methodology is to run it across the process voltage temperature corners to find the worst-case corners. You might do a few hundred Monte Carlo samples across those worst-case corners in order to do a final check.”

Chasing the tail
However, it is getting even more difficult to trace failures because the worst-case corners under non-statistical variation may be different from when Monte Carlo analysis was done due to variation. As such, which tool can you rely upon to correctly get you the value of that Gaussian tail bit—especially when there is no simple calculation to show what that tail bit will look like?

“It’s a combination of all these variations that could occur as a foundry is manufacturing this transistor,” said Deepak Sabharwal, vice president of IP engineering at eSilicon. “How much dopant are they’re putting in? What kind of fluctuation are they going to have? What other mask variations are they running into? What foundries do is they try to build all these models, then build a composite model to represent all these variations. And then you need special tools to really extract the information in the high-sigma domain from these models. I often see people calculate 3 sigma, and then just extrapolate for 6 sigma. They say, ‘if 3 sigma is 15% variation, 6 sigma should be 30, but that is not true. That’s why you have to use special techniques, which we typically refer to as high-sigma analysis, to get to those tail bit numbers. And then you apply the margins in the design for those tails.”

Fortunately, the foundries today do give guidelines, he said. “They tell you, ‘When you’re doing timing closure, apply such-and-such extra margin in your design. Watch out for this effect, and that effect.’ So they are helping you out, and the reason is that they are seeing the variations happening and want to make sure people can build designs that work, because ultimately it’s in their best interest to get these things into production.”

The variation part itself is not going to go away. “It’s here to stay,” said Sabharwal, “The densities that people are integrating on these chips is going crazier and crazier, and sometimes I think it is a miracle that we can even get these designs to even run.”

So what about 6 or 7 sigma? And what does 98% correct mean in automotive systems?

“With the 2% that’s left over, or the 1% or even the 0.5% that gets left over, you can visualize the Gaussian curve, and high sigma is about that behavior in the tail,” said Steven Lewis, mixed-signal marketing director at Cadence. “We become interested in what happened in the tail. We’re producing so many chips that even a small amount of failures still represents a lot of chips that could fail. We want to make sure that whatever the behavior is in the far tail, we’re still accounting for that in our design. It may become a safety issue when it comes to automotive or medical devices. It’s nice that only one in 10,000 or 100,000 fails, but if it happens to be your pacemaker or your antilock brakes, that’s you don’t care that it was one in the million. It’s very rare, but people do win the lottery. When it comes to looking at these statistics, we want to make sure that if you’re the lottery winner and it’s your anti-lock brakes, it’s still going to be okay. Chips are designed to handle that to be okay. Of course, this leads into other topics of functional safety and a variety of other things, but when we’re looking again at the high-sigma applications, we’re looking at the extreme cases in the extreme parts of the tail, and we want to make sure that we’ve accounted for those.”

More sigmas
While the six-sigma approach has been in use for more than 30 years now, its adoption has expanded considerably.

“The initial focus was on manufacturing and the elimination of defects,” said Tom Anderson, technical marketing consultant at OneSpin Solutions. More recently, six sigma disciplined, data-driven principles have been applied to improve many types of processes, including design, product development, and even services. As such, it is an entirely appropriate approach to improve the verification of complex chips. Many safety-critical applications adopted six sigma in their verification processes, notably automotive electronics striving to meet the ISO 26262 standard.”

Along with this, reliability and safety requirements are driving the need for better verification metrics and more comprehensive verification in many types of chip designs.

“During pre-silicon verification, design analysis tools and formal apps find a wide variety of systematic failures (design bugs) automatically,” Anderson said. “User-specified assertions about design intent feed into deeper formal analysis to find corner-case bugs that escape traditional simulation-only verification flows. Formal equivalence checking ensures that functionality is not corrupted by tools that transform the design, including logic synthesis and place-and-route.”

Further, the safety standards discussed throughout this article, including ISO 26262, IEC 61508, and DO-254 set a high bar for systematic verification so that design errors do not make it into silicon. “Many such standards also have requirements to mitigate errors occurring during operation of the chips themselves once they are in use in the field,” Anderson said. “Formal tools provide a solution here as well by considering possible random faults and determining which will not affect functionality, which will be detected, and which will be corrected. When combined with strong processes for manufacturing, the Six Sigma approach can minimize systematic failures and silicon manufacturing defects while mitigating the effects of random faults.”

Looking ahead, Mentor’s Gupta expects more designs will be verified to high sigma. “Right now, there’s a subset of the designs that are being run, but we’ve been seeing an increase in the types of designs, as well as the number of designs, going through going through these tools over the last three and a half years of the. Also, we expect to see a lot more companies doing automotive design. There was a lot of venture capital funding in 2017 in the semiconductor industry for startups so there has been a lot more design activity in this area. We’re seeing the foundries developing IP for automotive applications, running their IP for high sigma as a result, as well as IP companies themselves. We see semiconductor companies doing higher-sigma design, too, going from 5 or 6 sigma and now doing a 6.5- or 7-sigma type of verification.”

The bottom line: While electronics will still fail, at least they will fail less often and in ways that can be more closely tracked and fixed more quickly.

Related Stories
FPGAs Drive Deeper Into Cars
Automotive OEMs are leveraging programmability for algorithms, evolving safety standards, and market differentiating features.
Verification Of Functional Safety
Part 2 of 2: How should companies go about the verification of functional safety and what tools can be used?
Verification As A Flow
Experts at the Table, part 3: How will Portable Stimulus impact SoC verification and what adoption approaches are likely to catch on first?



Leave a Reply