Each new node and architectural change results in an explosion of data, making optimization of designs significantly harder.
Moving to the next process nodes will produce volumes more data, forcing chipmakers to adopt more expensive hardware to process and utilize that data, more end-to-end methodologies, as well as using tools and approaches that in the past were frequently considered optional.
Moreover, where that data needs to be dealt with is changing as companies adopt a “shift left” approach to developing software, and verifying and debugging the chip. It now must be considered much earlier in the design flow. But combined with an increase in the number of transistors and complex IP blocks, more design rules, multi-patterning, and many more standards, even the best-equipped organizations are finding themselves swimming—and sometimes drowning—in data.
All of this makes it harder to optimize chips for performance and power, as well as ensuring that enough corner cases are included to have confidence that a design will turn out as planned, that it will perform reliably, and that it can be manufactured with sufficient yield.
Architecture
This data explosion stretches from one end of the design process to the other, and well beyond that, and it affects every facet of chip design along the way. Architects used to be able to visualize what functions needed to be included and then hand their specs off to designers. There are now so many choices, rules and considerations that many engineers question whether new chips will work at all—particularly when it comes to ECOs, new processes and transistor types, and the integration of more IP from more sources. That often leads to panic before signoff and extra costs for debug and verification, so the chips do function correctly, but that approach doesn’t always produce optimal results.
“To optimize a design, the architect really has to know what they want up front,” said Kurt Shuler, vice president of marketing at Arteris. “But it’s becoming a real bear because with each node they have more and more data, more and more libraries at each new node.”
This is particularly apparent as chipmakers shift from homogeneous multicore processing to heterogeneous cores in an effort to reduce power and right-size a chip’s resources to fit a particular application. “As we move into heterogeneous computing, you’re creating data based on one read or write, but you’re dealing with up to 10 coherency messages saying, ‘Where is the data?’ You’ve got more reads, writes and load stores, and more metadata above that. And that’s where the real complexity comes in. If the data isn’t consistent, then you risk reading dirty data as if it is correct.”
Designing for this is a challenge in its own right. Optimizing it in a rising tide of data is an even bigger problem.
“Automation is the key here,” said , CEO of NetSpeed Systems. “Humans are good at analyzing to a certain degree, but if there is too much data they make mistakes. We need algorithms and machine learning. Right now we use judgment and make calls. And we need this at every step. You have to make intelligent decisions about which IP blocks to use and how to put them together. You don’t want a powerful processing engine with a weak memory controller. Everything has to be matched, but it’s being optimized with too many constraints.”
That often leads to overdesign, which can negatively impact performance, power, as well as area.
“With legacy design, it was all spreadsheets. You could tweak them, and even then there was a lot of room for lapses,” said Mitra. “Heterogeneity kills you. An incremental change can have a ripple effect.”
It also produces results in even more data, which in turn leads to the need for higher levels of abstraction and more automation. But there’s another gotcha in here, too. Not all of the data is in the same format, which makes abstracting all of that data into models problematic.
“The model is of limited use if you can’t put the data out,” said Bill Neifert, director of models technology at ARM. “It depends on what format the data is in. We need to take models and open them up so that everyone can use them, but there is no standard format for defining or accessing that data.”
Neifert noted this is the reason why Google began offering APIs with all new services. Rather than trying to define everything by itself, it sought input from users similar to the way the open source community currently does. “They may not know all of the different ways it can be used,” he said.
Implementation
How and where to apply automation across different segments in a design flow isn’t always clear, though, particularly as more things need to be considered at the same time. This is basically a big data problem, but with a lot of dimensions and dependencies.
“While there’s certainly a lot of promise—and corresponding hype—about big data, the fact is that it’s not an easy fit for chip design,” said John Lee, general manager and vice president of Ansys. “The easiest fit for big data is what I call ‘log file mining.’ It’s easy to harvest log files and report files from EDA tools, and process those results, such as tracking DRC violations or timing convergence across multiple groups and projects. But this application has relatively low value to customers and can usually be done without real big data architectures. The harder fit is to make big data actionable. In customer terms, actionable means something that fundamentally improves performance, cost or reliability. Actionable data requires multi-domain analytics.”
This would include such areas as power integrity, timing, routing, and reliability analysis.
“When you map it out, multi-domain analytics requires that you do a serious amount of scientific computing, such as logic/timing graph traversals, circuit matrix solutions, geometric searches,” Lee explained. “So big data also requires ‘big compute.’ But big compute has to be done in a way that customers can afford.”
Verification and debug
What’s affordable is a relative term, often defined by specific markets and sometimes for the same market at different times. A maturing smartphone market, for example, is going to be far more cost-conscious in design than a market that is on the upswing and which can leverage one design for billions of units. Likewise, equipment in data centers may be more price-sensitive if the application is not mission-critical, and an automotive chip may have more perceived value if it is used for a unique competitive advantage rather than one where there is more competition.
That has a direct bearing on what companies are willing to pay for tooling to get chips out the door, which is why companies typically use a mix of simulation, emulation and FPGA prototyping to verify their designs rather than buying only emulators. But as the amount of data continues to rise, they also are recognizing that technology needs to be applied more efficiently and effectively, no matter how resource-rich the engineering organization may be.
“The challenge is how to verify in the environment in which this is being used,” said Jean-Marie Brunet, marketing director for Mentor Graphics’ Emulation Division. “It’s the same script environment, but it’s time to verification within the context of a live application. It’s no different with power. You verify in the context of how that power is utilized.”
This is harder than it sounds, because to get to that point requires processing huge amounts of data earlier in the design cycle. “In networking infrastructure, you have to look at the traffic being run through a switch or a router,” Brunet said. “Post-silicon is too late for that. It has to be done pre-silicon with simulation or emulation.”
Lauro Rizzatti, an independent emulation consultant, agrees. “The addition of embedded software to the hardware design mix creates large data sets that need substantial verification resources to analyze and debug them,” he said. “Hardware emulation has the ability to debug big data, including embedded software, when other verification tools can’t. It offers an accurate representation of the design before silicon availability, since it is based on an actual silicon implementation. Today, processors with embedded graphics and networking chips approach 1 billion ASIC-equivalent gates. Add to that the embedded software that would require billions of verification cycles for exhaustive testing. This setup leads to a great deal of data that needs to be sifted through and analyzed.”
That’s one slice of it. Formal verification is gaining in popularity as the amount of data grows, as well. Long ignored, misunderstood, or underutilized, formal has turned out to be a highly efficient use of verification resources.
“Verification has always been an issue with big data,” said Dave Kelf, vice president of marketing at OneSpin Solutions. “Engineers get a mass of data and they have no time to go through it all. If a chip doubles in size, there is a massive amount of new data. And now physical effects are making their way into verification.”
Kelf said that formal is basically a huge database, and as security issues become more pronounced—and produce their own volumes of data—it’s being used to transition between states that can be used for side-channel attacks. “What you’re doing is characterizing the propagation of information through a device. So is there is a secret key, you can propagate that through a device on every signal the key goes through. The formal tool figures it out. This is big data. What you’re doing is monitoring the path directly. It acts like a big filter. The machines are powerful enough to handle this, and big data is contained in the database.”
Conclusions
Whether these kinds of issues can be avoided in the first place is debatable. Correct by construction proponents have argued for years that one of the reasons there is an explosion of data and the need for so much verification is that design methodologies are flawed.
“People try to search for the needle in the haystack,” said , CEO of Agnisys. “Why put the needle there in the first place? For example, debugging through deciphering information from a large log of messages can be avoided and the whole debug step eliminated if a proper specification-driven methodology is followed.”
No matter how good the design methodology and execution, though, there is no question that the amount of data that needs to be considered by design teams is on the rise. There may be less in a perfect design, but there will be more at each new node. While this is good for tools vendors in the short-term, the bigger question is whether some of this data can be handled more effectively through other means, such as platforms, subsystems, advanced packaging of discrete elements, and a revamping of some of the silos within chipmakers to deal with data at different points during the design flow.
Still, the challenge isn’t the ability to process all of the data. The problem is how to use all of this data more efficiently, ignoring what is not essential in one part of the design while understanding that it may be useful somewhere else in the flow. The push to heterogeneous multicore, for example, makes sense in terms of performance and power, but it has a has a multiplicative effect on the number of possibilities that need to be tested based upon RTL-derived models.
“It all needs to be tested,” said ARM’s Neifert. “So we can generate models automatically and we can determine what to look at. But as you get more data the question is where you draw the line between what needs further analysis and what doesn’t. Not every user wants to do it the same way. Sometimes that can vary by market segment. It depends upon the skill of the designer and feedback of the customer. In some cases, they want to expose all of the data and come up with their own plug-ins.”
In effect, what is needed is a comprehensive understanding of all of the data being produced and how it can be used for maximum effectiveness at what time and by whom. This is the equivalent of architecting the data behind a chip design, and so far no one has successfully done that.
Related Stories
SoC Integration Headaches Grow
System Level Verification Tackles New Role
The role of system-level verification is not the same as block-level verification and requires different ways to think about the problem.
Full Coverage Or Full Monty? Is it time to embrace big data and will it tell us that we do too much verification?
Leave a Reply