Best Practices For Multicore SoC Test And Debug

There is no simple solution and technology doesn’t solve everything; complexity creates more complexity.

popularity

By Ann Steffora Mutschler
In increasingly complex SoC designs, many of which contain multiple cores and multiple modes, determining best practices for testing and debugging is a moving target.

Jason Andrews, architect at Cadence Design Systems, said multicore debug is a huge issue. It isn’t easy to do, and there aren’t many good ways to do it.

He suggested one approach is to try to use virtual platforms as a way to do multicore debug in the context of software running on multiple cores plus whatever the state of hardware system is. “I look at it and try to provide an environment that is a programmer’s view for all the different cores in order to see what’s going on with each core, including the importance to the hardware and the peripherals. The programmers are reading and writing registers and interacting with the hardware behavior. Today there is not an easy way to do that, so even if you have good software debugger it only sees software. It doesn’t really see hardware,” he explained.

Complexity also can be exacerbated based on the type of multicore design, as Frank Schirrmeister, director of product marketing for system-level solutions at Synopsys noted.

“As you look at homogeneous multicore where you just are adding compute resources and you basically don’t have any dependency of the functions running on those cores between each other, there’s not much challenging there. Every core does the same thing in principle. You’re just feeding the data into it step by step. So if you have that decoupled I think the current techniques will work because you do single core debug and then multiply it because you don’t have much dependency,” he said. “The challenges come in when you do heterogeneous multicore and you are trying to distribute functions across those cores. Now you have very intricate dependencies.”

With heterogeneous multicore developments on the hardware side, software debug very likely will make virtual platforms essential because traditional techniques are breaking. If you are doing this on a real chip and your hardware stops, you’re stuck, Schirrmeister said. “What you can do in simulation is backtrack. You can reproduce it in simulation, you can do things that are simply not possible in the final chip. You can hold one processor and let the others continue to hone in on the dependencies. Multicore and debug on virtual platforms gives an additional push because the traditional techniques are breaking.”

In terms of visualizing SoC complexity, Andrews believes the best way to do this is to give a visual picture to the person doing the debugging of the state of the system. “A big part of it, in terms of system and multicore, is to visualize the state of the system in an easy way. We do this a lot either with programs that give you a list of hardware, or transaction-level views to get a concise summary of the behavior that just happened or is happening.”

Cadence has embraced SystemC and TLM-2.0, which it said provides the interconnect between the components so that all of the behavior between the cores and between the different peripherals in the hardware can automatically extracted, he explained. The company also extended SystemC a bit to define classes, which can be easy to visualize, and tools such as a state of a particular peripheral or the registers in the peripheral.

Still, more is needed to allow better visualization to the hardware. “A lot of it is extracting the information so that you can make sense out of it because for a particular operation it can have many low-level transactions and it’s really hard to sit and sift through one at a time. By providing tools that automatically extract transaction sequences which say, ‘These last 20 things you did are setting up a DMA between the video controller and the memory,’ and show you the parameters and behaviorally what it is doing, then it makes it a lot easier to understand,” he continued.

In essence, the information provide during the construction of the hardware platform is also used at run time for analysis and visualization.

The market for virtual platforms has developed such that customers are partnering closely with tool providers. The tool companies are doing some amount of model building and platform development to bring to the customers and then may extend it and add more models based on customer requirement.

The first wave of virtual platforms involved fixed simulation configurations. However, the current generation of customers wants to have more flexibility, Andrews said. “They want somebody to come with the tools, with a good library of models and connections to the right IP providers but then they also want to have something they can extend and work with.”

Managing capacity and size
At the other end of the spectrum but just as critical is manufacturing test of SoCs.

Complex SoCs today have hundreds of millions of gates, and managing the sheer capacity and size of designs puts tremendous pressure on test engineers, said Greg Aldrich, director of marketing for Mentor Graphic’ silicon test solutions group. “In manufacturing test, one of the things that you have to deal with is concern about creating the test patterns in a reasonable amount of time: Can I create the data? How long does it take to create? How many machines are needed? How much horsepower is needed on the machines to create that?”

Even more critical is whether or not the test data will provide adequate coverage and will fit on the test equipment, he said. “It’s really a throughput in manufacturing issue because that’s where it’s not a one-time cost—it’s a recurring per device cost. So every additional second you have to sit on the tester to test this SoC is an additional cost that you will incur for every chip that gets manufactured.”

To deal with the size of the volume of data today, embedded compression is used, which is a technology that has been around almost 10 years and allows a piece of logic to be embedded onto the front of the scan chains. Another way test engineers are looking at dealing with just the volume of data in the cost of test is built-in self-test (BIST), which is the mainstream for memories, Aldrich said.

Taking a hierarchical approach to test can also be employed to manage the sheer size of SoCs. “Test has been a back-end process where you don’t do anything until the complete gate-level netlist is done, and then you do it all at once. With a few hundred million gates, that becomes increasingly more difficult and in fact most large SoCs are being partitioned when they are being designed anyway,” he explained. “In a lot of cases, customers are looking to use a partitioned approach or hierarchical approach to doing all of the test as well: generating the test patterns instead of the whole design at one time or doing one partition of the design and then leveraging those test patterns and propagating them up to the top level of the SoC when it gets compiled.”

Challenges on the horizon
Not surprisingly, power is an issue at every turn. “It doesn’t matter if it’s a low-power cell phone application or a high-power server chip. If you’ve got a few hundred million gates, power can be an issue with a chip running at several watts or several milliwatts,” Aldrich noted.

In addition, customers today are struggling with figuring out why they are getting failures in manufacturing. “You might have a process that is at 70% yield, which means you are getting, if you are in high-volume, thousands and thousands of failures that are coming from the test patterns. The challenge a lot of people are looking to now is how to take that data and figure out why this thing is failing. Is it a yield problem? Is it a design problem? Is it one of my design-for-manufacturing rules that isn’t quite right?”

Mentor has observed the last couple of years the area and technology of diagnosing scan pattern failures emerging in order to take the failures coming from the tester during manufacturing test, feed them back into a diagnosis tool along with all of the design information (including the physical layout information) in order to identify what caused the failure and where is this failure located.

To be able to leverage that and do some statistical yield analysis to identify if it is all just random particle effects or if there are systematic issues within the manufacturing or design process is significant. “This is happening in high-volume design and especially as customers are moving to smaller technology nodes,” Aldrich said.

Mentor’s diagnosis and analysis tools in this area all fall under the Tessent brand name, following the company’s acquisition last year of LogicVision.