Drowning In Data

Multicore and advanced nodes bring a whole new kind of problem—too much data.

popularity

By Ed Sperling
The old adage, “Be careful what you wish for,” has hit the SoC design market like a 100-year storm. After years of demanding more data to understand what’s going on in a design, engineering teams now have so much data that they’re drowning in it.

This is most obvious at advanced process nodes, of course. But it’s also true these days at more mainstream nodes such as 40nm and 65nm, and in multicore designs at all nodes where cache coherency is critical and where multiple power islands are used to keep silicon dark when it’s not in use. And with more IP being characterized for as many possible configurations as possible in order to avoid potential contextual problems, the amount of data has grown exponentially.

Dozens of interviews conducted over the past several months show that engineering teams tend to work at two extremes with this mass of data—they either try to understand as much as possible to improve performance and minimize energy consumption—or they ignore it and try to fix functional problems late in the design cycle, hopefully with software. Which approach chipmakers take often depends on the market they are targeting, the complexity of their design, and what the risk of failure is. There is a middle ground, as well, which is focusing on the critical data paths and ignoring the others to the end. But in all cases, the amount of data everyone is encountering is mind-boggling, slowing down the design cycle and adding uncertainty every step of the way, even beyond signoff.

The volume of data also is beginning to affect one of the key strategies of design teams, which is to divide and conquer by developing portions of an SoC independently and then verifying the whole. Consider some implementations of ARM’s big.LITTLE dual-core processor, for example, which can be optimized for a variety of tasks due to the different power and performance profiles of the different cores. When four of these big.LITTLE configurations are implemented together in a design, cache coherency can become much more difficult to achieve than in a dual-core configuration of one big and one little processor.

“The problem is that the software checks out with the hardware early in the design when you run it on one big.LITTLE configuration,” said Jim Kenney, director of marketing for emulation at Mentor Graphics “But when you verify all the code on all the cores, that creates issues with cache coherency and sometimes even deadlock. It’s hard to debug, and you don’t want to find this out after silicon.”

This particular problem isn’t unusual in complex multicore systems, but it’s made worse by the amount of data—in part because the cores are heterogeneous. “When you connect an out-of-order system to cache you get interesting coherency bugs,” said Rob Aitken, an ARM fellow. “So you fix the cache coherency, but then you have to validate it, and there are a number of very obscure use cases that don’t match your software expectation. What’s important to understand is that the system validation is different from the core validation, and multiple cores acting on the same data can produce definitions of time that don’t make sense. You have one process writing data and three reading it, which an create synchronization issues.”

That creates more data, too, which is why emulation sales have been growing so spectacularly for Mentor and Cadence, and why Synopsys bought EVE last October. The amount of data involved is too large for simulation to process in a reasonable amount of time, and emulation can be used to verify software as well as hardware, independently or together.

“We have one customer that created an embedded testbench to bring up hardware, software, and hardware plus software issues,” said Michal Siwinski, group director for product marketing for Cadence’s Systems and Software Realization Group. “That way they can bring in the full SoC. But if you’re looking at an A15 or A7, you’re not really worrying about the core RTL. What you care about is the interaction and how it all comes together with debug and the use model. The biggest dependencies are between the SoC and the lower-level software and firmware.”

Margin vs. optimization
The alternative to fixing some of these problems through optimization and wading through enormous amounts of data is to add margin to a design. While that doesn’t actually help deal with the data overload issue, it is a way for design teams to ignore at least some of it, as long as the core functionality is intact. If you can count on an IP partner, then at least some of the data is someone else’s problem.

Just reading through and understanding data can be enormously time-consuming. John Koeter, vice president of marketing for Synopsys’ Solutions Group, said the intent of all the data that comes with IP is to make the integration simpler by identifying all the “gotchas.”

“DVFS (dynamic voltage and frequency scaling), which has been popular for the past five years, alone can be 200 pages of data,” Koeter said. “Then you add in things like cache coherency and how you go about partitioning which applications on which processor and the amount grows further. One of the things we’ve done for big.LITTLE is to create a virtual prototype with a hypervisor so you can step through the possibilities task by task and determine how much to run on the big processor or the little processor. That’s not always obvious.”

He’s not alone in that assessment. Mike Gianfagna, vice president of corporate marketing at Atrenta, said the real problem is the interaction of blocks, not the blocks themselves.

“The dream is a consistent vocabulary of what quality means so that packaging of deliverables is easier to do in the future,” Gianfagna said. “Right now there’s no way to deliver that to the end user. There are assertions that help to fix blocks, but when you hook it up, how do you run enough data to prove it. If you have to run the software and the video and all the other functions, that’s a lot of data. So how do you coherently manage that information and run scenarios? And for the sub-pieces, how do you come up with ways of developing high-level power models for energy consumption and then create accurate power estimation by combining all of these models? If the data is too granular, you have a data explosion, which makes it useless. And if it’s too coarse, there’s no real value.”

Raising the abstraction level
Adding a level of abstraction makes sense in theory, but the reality is that no single abstraction level does everything. But raising the abstraction level on data is tricky, particularly when it involves so many interactions and possible tradeoffs. There are several approaches that are possible here.

One option is to mine the data, an approach that has become popular on the metrology side of wafer manufacturing with critical dimension-scanning electron microscopy, or CD-SEM. The files there are so large that even the mined data requires 1 terabye of RAM to process it. Understanding what’s needed out of the data in IP to manage interactions has never been fully determined or standardized, though.

A second approach is to add more assertions into the mix, which helps explain the rationale for why Jasper Design Automation just teamed up with Duolog. Assertions act like vectors throughout a design, but they’re particularly effective when used across a complex SoC for things like power, heat, noise, signal integrity and even security.

“The key here is matching black box to white box,” said Oz Levia, vice president of marketing at Jasper. “Do they match? And if you make a change, you run them again and determine if they still match. That way you can keep the spec and the IP together.”

A third approach is to create models of the design at a very high level and keep those models updated throughout the design process and all the way through to verification and manufacturing. While TLM and power models do help with the data management, the complexity of designs means the models aren’t always complete. As a result, engineers still have to sift through data to optimize the design and debug it.

Conclusions
Data overload won’t stop progress on SoCs and complex chips, but it will slow it down. Too much data makes the design and verification process more difficult—particularly in markets where reliability is critical, and in devices where some functions are critical.

Being able to process data on bigger machines is helpful, and so are abstraction levels, assertions and models. But really optimizing data requires a deep understanding of that data and all the possible interactions, and with too much data that’s impossible. It will be up to the EDA industry to make sense of this and automate some of it, but as the growing number of complaints about data attest, this problem is ballooning faster than solutions are arriving.