As the amount of data in designs explodes, Big Data tools and techniques are being added into EDA and manufacturing.
Scaling, rising complexity, and integration are all contributing to an explosion in data, from initial design to physical layout to verification and into the manufacturing phase. Now the question is what to do with all of that data.
For SoC designs, that data is critical for identifying real and potential problems. It also allows verification engineers working the back end of the design flow to understand what happened at the RTL and place and route phases, and for chipmakers to figure out why their chips didn’t yield well in a foundry. But as data volumes continue to rise, filtering that data to extract useful information—and the more data the more useful information that can be extracted—requires more effort.
“Big data can change the way designs are done,” said Mike Gianfagna, vice president of marketing at eSilicon. “You can harvest all the information for chips and set boundaries and store that knowledge for later use. For IP management, you can save information, set schedules and nail down those schedules and actually predict when you do a tapeout. We’re all now big data companies. If you think about the tapeout of a monster chip, that may be petabytes of data.”
In general, there are several approaches being taken to deal with that growing mass of data. There are more tools being used to mine data for highly specific purposes at more points along the design flow. There is ever-faster hardware and new hardware architectures being used to run those tools and extract essential data. And there is more effort being put into creating consistency in the data.
However, none of this is going exactly according to plan—even in cases where there really is a plan—and results vary widely from one design to the next, one vendor to the next, and one tool to the next. What’s important to understand on the SoC design side is that many of the techniques that are used in mining Big Data—algorithms that can pick out individual words or set of words or identify bank account numbers, for example—don’t work in the design world. RTL, SystemVerilog, and UML contain very different kinds of data, while the text, numbers, and digitized images and videos stored in large commercial databases are more or less homogeneous binary code, a key difference in generating what is known as clean data
“The first step is being able to extract data from different things, which could be snoop activity on a bus fabric, but that data is different than a code coverage metric and it’s different from how many simulators you need,” said Mark Olen, verification marketing manager at Mentor Graphics. “You need to extract data from a wide number of sources, not just an Oracle SQL (structured query language) database of 1’s and 0’s. Then you have to look at time/trend data and create a centralized database. But that collection has to be really efficient because just the first part of this can generate terabytes of data.”
The next challenge is to filter all of this data into a single dashboard for systems engineers. While there is no push-button solution, at least it would allow design teams to at least figure out what tools need to be applied where and when. So far, the translation and correlation of all the data is proving to be one of the thorniest parts of this effort. It’s not a straightforward binary translation because much of what makes the data useful is context and correlation.
“The question is how you correlate, ‘At such and such a time on a master network this transaction was initiated,’ and then, ‘At such and such a time later, something happened,'” said Olen. “This is more than just 1’s and 0’s. There is much more to this than just translating data. You’re correlating data over time.”
He calls this one of the big races in EDA—building a dashboard that can fuse together different types of data and make sense of it, so that tools can be applied wherever and whenever necessary.
At the very least, it adds a competitive a sense of urgency to this problem.
“Big Data is a huge problem and it’s getting worse,” said Michal Siwinski, vice president of product management in the system and verification group at Cadence. “We’ve gone from terabytes to petabytes, and we’re heading to exabytes. Design size and complexity is growing exponentially. And with hardware and software together, you have more vectors that have to be put into context to find more problems.”
In a complex SoC, this includes waveforms, transactional data, and a host of other code that use a mix of formats. Add in validation, analysis and optimization, and any one of these areas can quickly get out of control. Or they can all get out of control, which isn’t all that uncommon.
“Basically you have a gigantic haystack and you’re trying to figure out how to find a few needles,” said Siwinski. “Most times you don’t even know what you’re looking for, so you have to look for the cause of the problem versus the source. Formal is one technique to minimize the problem, but the best way to understand the problem is a preventive approach.”
He described data at the back end of the flow as a whole new nightmare. “System verification has added complexity that is exponential from node to node. There’s a software impact, a power/performance impact, and then you have security and safety fault injection for the whole chip.”
While data centers have become smarter about dealing with Big Data, they also still rely on faster hardware. Emulation and FPGA prototypes are EDA’s answer to faster hardware, but the more complete picture in handling Big Data involves everything from memory to storage to networking and extends well beyond a single machine or even a cluster of them.
Commercial data centers have been gearing up for this for the past couple of decades—roughly coinciding with the widespread adoption of the Internet in the mid-1990s. In SoC design, this is still a relatively new phenomenon, aside from the simulation farms at large chipmakers that are now transitioning to hybrid verification farms. So what can chipmakers learn from the big data centers?
“It all starts with server virtualization,” said Ron DiGiuseppe, senior strategic marketing manager at Synopsys. “If you look at what VMware is doing, it’s providing the ability to scale across multiple machines. That’s something you would not have to do without big data.”
DiGiuseppe pointed to a number of things that need to be addressed to make that happen in the chip world:
1. Faster server processing, relying heavily on co-processors for specific jobs such as security encryption/decryption;
2. Faster access to memory, including new types of low-latency memory and caching schemes;
3. Adding network overlays, such as Virtual Extensible LAN (VXLAN) and Network Virtualization using Generic Routing Encapsulation (NVGRE) to push communication to a higher level in the networking stack;
4. Intelligent scheduling and prioritization of compute resources, and
5. Faster pre-written queries based on the MapReduce concept.
One key element in all of this is the ability to split up processing so that the biggest and most critical tasks are handled by the most powerful processors. A second is the need to remove any bottlenecks for shifting data back and forth between memories and within a memory, probably with a follow-on to DDR4 such as high-bandwidth memory or the Hybrid Memory Cube. And outside of a single device, it requires faster network speeds between processors and storage, as well as across high-bandwidth network.
While these kinds of data volumes may be new to chip companies–particularly outside of the verification stage–the same principles are in play whether it’s a financial institution or a cloud services provider. Pinch points need to be eliminated wherever data is moving or being processed, and particularly for chipmakers data needs to be available at any point in the flow more quickly than in the past. The challenge here is it’s not just one type of data, and it all has to feed into central points to be managed the same way as data in a data center.
Where chip design differs greatly from the commercial Big Data world, though, is the amount of data that needs to be collected and acted upon quickly across so many different interrelated operations. The only way to do that is to extract data at multiple points and figure out how that data is correlated. That requires as many touch points for that data as possible, frequently as many tools as a chipmaker can afford, and a really strong working knowledge of how to use those tools.
“In the past, brute force was applied to everything,” said Cadence’s Siwinski. “The question now is how you make it easier and more compact. To do that you need to be smart about how you deal with data to have a more useful outcome. A lot of that comes down to whether you are using the full benefits of the technologies you already have. Some customers, until a few years ago, were still using more typical simulation-based metrics. Now they’re using more emulation, metric-driven verification, and smart bug. The feedback we’re getting is that productivity is increasing even though the amount of data is increasing. So you’re seeing a methodology shift and a tooling shift.”
But getting different pieces to work together isn’t always as straightforward as it sounds, and it requires some work on the part of technology users to bridge the gaps.
“We’re seeing that with our own internal data analytics,” said Kurt Shuler, vice president of marketing at Arteris. “We use OLAP (online analytical processing), which can take data from any source. But getting clean data is a different matter. And from there, you have to figure out the metrics that are meaningful. Then you have to figure out how to display it, because every audience is different. So if you’re doing data management for chip status, you have to know which version of LEF (library exchange format) and DEF (design exchange format) files they’re using.”
Getting so-called “clean data” continues right through the manufacturing phase, as well. David Park, vice president of worldwide marketing at Optimal+, said that even at the foundry level the analysis is only as good as the data that analysis is based on.
“You can build something at factory A and factory B, and they are not run the same way so the data is different,” Park said. “If you get consistent data you can do data mining for yield and quality, looking at e-test, wafer sort, wafer test, final system-level test. You can even use that to create a quality index. So maybe there is what we call a good die in a bad neighborhood because the other die are failing but that one is good.”
Another approach is to build a database of what goes right and what goes wrong within chipmakers. eSilicon has been doing this for the past five years, Gianfagna said. “We do a tapeout, which is all hands on deck, and we go through unbelievable gyrations where you may have cell library trouble, so you make the chip smaller with another cell library, then go onto the next chip. But you store all these configurations and experiments in a database. With that you can run trial implementations of test circuits, do GDSII layouts, extractions, and store all of that in a database. Then you can use it to optimize a design very quickly and help people come up with the right chip recipe.”
Data overload and rising volumes of data have been discussion topics for at least the past several process nodes. On the manufacturing side, the amount of inspection and measurement data has been ballooning for the past decade. On the design side, it was possible to ignore much of that data and stay focused until 40nm. At 28nm, and particularly in the finFET world where double patterning bridges the gap between foundry, architecture and everything in between, that volume of data no longer can be safely ignored.
Yet solving this problem requires more than just an interface between different data types or linking of metrics. It requires a fundamental change in how heterogeneous data types are blended and how information is extracted from that data. Nevertheless, the results can be rather surprising.
“There is a recent example of how you can use cell phone data in third-world countries to predict an outbreak of cholera,” said Mentor’s Olen. “You can track movements and travel, which are different when people are ill. You can do the same for crime patterns across the United States, where there are known relationships between certain crimes that spawn other crime.”
Whether those kinds of mash-up results can be drawn from the design world using massive amounts of complex, heterogeneous data remains to be seen. At the very least, the data will have to be mined to figure out how to do more mundane tasks such as predicting and identifying bugs, improving yield, and getting working silicon at advanced nodes out the door on time and on budget. After that, it’s anyone’s guess what this ballooning mass of data can tell us.