Making Sure A Heterogeneous Design Will Work

Why the addition of multiple processing elements and memories is causing so much consternation.

popularity

An explosion of various types of processors and localized memories on a chip or in a package is making it much more difficult to verify and test these devices, and to sign off with confidence.

In addition to timing and clock domain crossing issues, which are becoming much more difficult to deal with in complex chips, some of the new devices are including AI, machine learning or deep learning. As a result, a chip will begin learning unique behavior as it begins applying training data to a particular use case. These chips also may be updated over the air, as in the case of automotive or IoT-based designs.

“There are a lot of applications for heterogeneous computing, including AI and machine learning, 5G, sensor fusion, and high-performance computing,” said Raik Brinkmann, president and CEO of OneSpin Solutions. “You want to map algorithms to the hardware, which is being done today in the cloud. But you can’t go after the cloud without addressing latency, performance and power. You also have issues with IC integrity, and beyond that functional safety and security. So the big question is how do you address all of this in the design flow. Do you implement it on some programmable fabric or do you do it with a heterogeneous platform? And when you verify this, do you do a bottom-up metric analysis with flexible code coverage or do you use a top-down approach?”

All of this raises a slew of new challenges. Rather than dealing with a single compute element and memory, verification and test increasingly include a combination of programmable and non-programmable hardware, firmware, and a complex software stack that affects everything from security to control logic.

“We’ve never seen heterogeneity on this scale before,” said Adam Sherer, group director of marketing at Cadence. “You’re dealing with a heterogeneous compute environment to begin with, and then the design itself is heterogeneous. So now you have different memories, AI and machine learning parameters, a broad set of IP, including legacy IP blocks, a whole new breadth of characterizations and multiple processors. Straight simulation using a UVM test style doesn’t work anymore.”

In some cases this requires different tools, but the real challenge is in the methodology and flow, as well as the amount of time and effort required to achieve sufficient coverage.

“We’re seeing a shift toward narrowly defined system-level tests, where you run multiples of those rather than a single test,” said Sherer. “The problem is that you don’t mimic a real-world environment. The alternative is to reduce the test scale by narrowing the test function in order to keep it in the time window that you have.”

There also are a number of issues involving interfaces between chips, because many of these chips are expected to operate as part of a much larger system.


Fig. 1: AMD’s heterogeneous “Raven Ridge” APU. Source: AMD/Hot Chips 30

 

Different architectures
Speeding up the verification process becomes more difficult as accelerators and memories are added into chips to deal with specific types of data. That approach is becoming much more common in data center training and inferencing chips for AI, machine learning and deep learning, but it also is creeping into safety-critical markets such as automotive and across a variety of chips used at both in data centers and at the edge.

Those chips can be very large and complex, and the big concern there is latency. At the outset of the verification process, which may be very early in the design process as more of this shifts further left in the flow, engineers need to recognize potential interactions and use models, some of which may not be evident to the design teams.

“The way we have done this in the past is there is always a CPU core and memory subsystem,” said Gopal Hegde, vice president and general manager of the server processor business unit at Marvell. “This was all standard stuff, and there was always talk about what interfaces you had to support. But how do you design the pipeline when the address space is much larger, and latency may involve going through an external fabric.”

Various interconnect standards exist for off-chip accelerators and memories, but supporting all of them is difficult.

“The industry needs to come together on standards-based interfaces for better data flow,” said Hegde. “We have Gen-Z and CCIX (cache coherent interface for accelerators), and we strongly support those, but we really want to support one interface. You can’t support those plus Gen5 PCIe.”

Others report similar challenges. “We’re in the middle of bringing up a 7nm chip,” said Mike Gianfagna, vice president of marketing at eSilicon. “The big concerns we’re seeing are interoperability, validation at the system level, and the idiosyncrasies of IP and different voltage levels.”

To address this problem, eSilicon has basically developed IP “platforms” for both AI and networking and switching. “These are groups of IP we know work together,” Gianfagna said. “We also have developed platforms for the metal stack to ensure testability, voltage range and reliability. You want to make sure all of the IP can use the same metal stack, and you want to make sure all of the third-party IP you add in does the same. You can get rid of a lot of problems that way for interoperability. We think that’s going to be a requirement going forward. You want the best-in-class IP that can interoperate with everything else.”

Fig. 2: An NXP heterogeneous “crossover” chip architecture. Source: NXP

Exploring unknowns
A variety of new applications require a lot of exploration on the part of the design team. The slowdown in device scaling due to rising costs and reduced power/performance benefits after 16/14nm has forced chipmakers to look at different architectures, new materials, as well as possibly pushing to the next node. Some approaches work better for certain applications than others.

“Exploration is very important with AI architectures,” said Ron Lowman, strategic manager for IoT at Synopsys. “This is happening in the data center, whether it’s for training or inferencing. And beyond that, it’s separated by market. We’ve seen companies create beachheads in digital TV and voice, where you have AI for voice recognition and CNNs for vision. These new approaches offer as much as 100X improvement versus GPUs. Those GPUs were used in the past because they were available, but they were not designed for this.”

One increasingly popular alternative is heterogeneous architectures. “We’ve seen startups doing interesting things with new non-volatile memory technologies and smaller processors, basically modeling the brain,” said Lowman. “We’re also beginning to hear about single-bit quantization. So if you can take 32-bit floating point and compress that to 8 bits or even a single bit, what do you lose by doing that? How compressed can you make all of this?”

The challenge is figuring out what gets put on a single die and what gets put somewhere else, potentially another die in the same package.

“We’re seeing a lot of new architectures, some including HBM with 512 Gbps bandwidth or more in a stack,” said Frank Ferro, senior director of product management at Rambus. “The problem is that this typically requires more processing power, and this is what architects have to look at. If you have four stacks of HBM at 2-plus Gbps, how do you make sure you balance the processing and the bandwidth? Too much of either is not good, and at the moment a lot of these are going dark because the users can’t keep them busy enough.”

That opens up some interesting possibilities, because it allows chipmakers to basically increase data density to the point where more can be achieved with each compute cycle, and more can be done locally using small processors versus running everything through a massive multi-core processor. Cornell University published a paper to this effect in Feb. 2017, examining possible ways to improve speech recognition with sparse long short-term memory and load-balanced quantization.

“If you need 100 multiply-accumulates (MACs) per cycle, you would make different architectural choices than if you could do this with fewer MACs per cycle,” said Gerard Andrews, product marketing director for audio/voice IP at Cadence. “So for speech recognition you might use a DSP on the front end and hook it up to an accelerator. There are a lot of cores targeting image analysis and image recognition, and those neural networks are much bigger than for speech recognition. But alongside of this we’re hearing people complain about the tool chain and flow.”

Testing changes
That is especially true for the verification and test portions of the flow. Verification has always consumed the lion’s share of design time. Test, meanwhile, has largely been ignored prior to the past several years. Much has changed since then. There have been number of big shifts on the test side as a result of more complexity in designs and new architectural approaches.

“In the past, you had wafer sort and final test, but these devices are becoming so complex that is no longer sufficient,” said Ira Leventhal, vice president of Advantest‘s New Concepts Products Initiative. “So companies doing phones, for example, rely on system-level test. But now you have to think about where to insert that system-level test. With deep learning and software updates, you need a sufficient amount of self-test because you can’t predict everything that will change up front. And with hardware and software, you need functional test to make sure you are developing to spec. So now you’ve got new problems. To cover applications, there have to be BiST (built-in self-test) methodologies to provide sufficient coverage, but that also has to be efficient enough. With processor chips, you need to figure out how and where to add in system-level test because you’re not going to get rid of other tests. So where do you get the money?”

Leventhal said that vertically integrated systems companies may be able to dilute those costs across the entire system design. But for complex collections of die in a package, it becomes much more difficult.

“If a company is providing known good die for a stack or module, it has to work,” he said. “Otherwise it’s very expensive. Right now customers are grappling how to pay for that. One approach is adaptive test, where you take test results from one insertion and apply them to a different insertion. What we’re finding is that AI can plan a role here, too. If you have a series of 50 tests, and those tests take 3 minutes, maybe there’s a way to do a subset of tests with AI. That’s a way to keep test costs under control. But there’s also always going to be a level of mistrust here because AI is a black box.”

To raise that level of mistrust even further, some of these testing approaches are as new as the devices being created.

“With 5G, AI and autonomous driving, we’re seeing much larger SoCs,” said Matthew Knowles, silicon learning product marketing manager at Mentor, a Siemens Business. “Customers were coming to us saying, ‘We can’t afford weeks of testing and all the people that are required, so they want the EDA and tester folks to come together. The ATE connects are over a simple protocol, and in the past they have tried to have EDA software running on the tester. That turned out to be an unsupportable situation. What’s needed is a well-defined interface.”

Knowles noted that system-level test increasingly involves a test of the package rather than the SoC. “With the big mobile companies, the top concern is time to market. But with more and more analog IPs to test, now you need ATE capability and you have to implement that with iJTAG (internal JTAG) now. Basically you need to connect the dots, which means you need to put a lot more thought into test. If you’re testing characterization, you need to look at the power envelope carefully for always-on components. That characterization needs to happen at all different levels of power. A single parametric test is not sufficient.”

Safety-critical/security
This is required in complex chips that will be used in applications such as assisted and autonomous driving, as well as industrial applications where safety is involved. In the past, most of the chips used in these markets were simple actuators and microcontrollers, or in some cases FPGAs. But as more intelligence is added, some of these chips and packages are becoming extremely complex, and they must be verified and tested on a level never required for chips developed at the most advanced nodes.

“What we had in the past was bring-up, integration, validation and verification,” said UltraSoC CEO Rupert Baines. “We’re now heading into a world of predictive maintenance where you have companies like NXP, ST, Infineon and Intel writing sophisticated code on-chip. And then you’ve got Tier 1s like Bosch and Delphi layering on more software, and they want validation, verification and analytics. So they’re looking for preventive maintenance that goes beyond what a chip company can do on its own. If you have a bug, you need to trap it so it doesn’t cause a problem. And if the system is hacked, you want to know about it. But that means you have to detect bugs, hacks, and safety issues while the chip is running. You see this with exascale computing, too. It’s not longer just unit testing.”

Conclusion
While tools always could be faster, particularly in verification prior to signoff, as well as for multi-parameter and multi-chip testing of complex heterogeneous architectures or multi-die packages, the real culprit appears to be systemic complexity coupled with newness of these different approaches.

Exploration of different design approaches and architectural possibilities makes the job of design teams much more interesting, but it also makes it much harder to establish best-practice types of methodologies and flows for these designs. That requires real-world use-case data, which takes time to develop, and at this point it is extremely sparse.

Related Stories
Heterogeneous Computing Verification
How to verify increasingly complex chips.
The Next Big Chip Companies
Who will win in a world where architecture and packaging are now differentiators, and where intelligence, connectivity and security are the new prerequisites.
Tech Talk: Verification
How verification will change as chips are used in machine learning, medical, 5G, and what impact advanced packaging will have on all of this.



Leave a Reply


(Note: This name will be displayed publicly)