Using Data Mining Differently

As the amount of data continues to rise, so does the ability to do something useful with it.


The semiconductor industry generates a tremendous quantity of data, but until very recently engineers had to sort through it on their own to spot patterns, trends and aberrations. That’s beginning to change as chipmakers develop their own solutions or partner with others to effectively mine this data.

Adding some structure and automation around all of this data is long overdue. Data mining has been in widespread use for the better part of this decade for everything from marketing to to bitcoin. The initial idea was that keywords, phrases, and even images and shapes can be sifted out of massive quantities of data with pattern recognition. But data mining also can be used to identify outlier data in areas such as manufacturing, or irregularities in designs or use cases, and that has opened a new window for development of chips, tools, and software.

“The new emphasis is on fast mining,” said Jim Hogan, managing partner at Vista Ventures. “So if you look at bitcoin, the early approach used FPGAs, which were more about power but not performance. But if you turn that into an ASIC, you get a 100X performance improvement. Then things grow exponentially. So if you think about a hardware algorithm running on an ASIC, you can do 100 more trades per second. It may cost you $1 million more in NRE, but the return on investment might be less than an hour.”

This also blurs the lines with machine learning and deep learning in real-time applications, such as where speed is needed for safety reasons. “You’re dealing with big data sets when you apply labels to images, such as images of dogs, or specific breeds of dogs,” said Gordon Cooper, product marketing manager for embedded vision processors at Synopsys. “But what do you do with the data after that? If an autonomous car sees a dog in the road and identifies it as a dog, do you swerve or brake?”

All of this leverages data mining techniques. And increasingly that technology is being applied further back in the design process, where these chips are developed in the first place.

“When you initially start [a project] you think of the kinds of data that you want to look at,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “Then you focus on how to extract that data. What are the interesting databases and database techniques? You come to realize there’s actually tons and tons of data, so the key is how to convert data into insight.”

As things progress, the task becomes less about how to extract the data, which is the basis of data mining, and more about the insight, he said. “What is the data telling you? You can think about two things. One is visualization. You visualize the data in certain ways, such as trends or coverage. What was the coverage last week versus this week? Or what was the coverage last quarter versus this quarter? Am I making progress? Where am I making progress, and where am I not making progress? The other thing is, what do you care about? As an organization, as a design team, as a super alpha engineering team, you focus on the metrics that you care about. As we all know, the metrics that you display are the metrics that the team is going to focus on so, you have to be really careful in understanding what the metrics are that you care about, and then you focus on that.”

That leads to further refinement to what really drives action. “From insight you go into the action,” he said. “What is the data, how was the data visualized or presented, and how does that inspire action? Ultimately that’s the real test. How does our design team work? How do engineering teams work on getting actionable things from the data? Once that is resolved, it becomes a virtuous loop, because once you have action that feeds into data, and you go through the loop again.”

Getting granular
This is becoming particularly important at the system level. Chips are so complex that all sorts of abstractions are essential, but those have to be paired with the ability to drill down as necessary. Formal verification and debug are one form of this. Data mining is one more tool, and a potentially powerful one.

This is particularly important at the system level, where getting different blocks to work together is increasingly difficult. It requires hardware and sofwtare integration, as well as an understanding of how different software interacts.

“We can’t solve that by breaking things down into smaller chunks because the problems only manifests when you’ve got the whole design in one big operation,” said Rupert Baines, CEO of UltraSoC. “And in particular, it only manifests when you’re in a real context, and all of those blocks are working on real stuff. This is where design merges with big data, because we’re talking about a gigahertz clock, and with a typical bus you’ll be generating half a terabit per second of data just from one bus. If you think of the number of processors, and the number of buses, interconnects, peripherals and controllers, you’re talking about many, many terabits per second. That’s where you get into this world of big data, and the challenge of systemic complexity runs full crunch into the challenge of big data.”

One approach gaining traction is the addition of smart analytics modules that are designed in and touch each of the blocks in a design, so instead of doing post filtering, they do on-chip intelligent, smart analytics. In this way, it allows for data set sizes to be in the megabytes instead of terabytes or petabytes. This data is then moved out of the chip over USB, PCIe or WiFi and because it obtained its intelligence from a built in module on the chip, and the transistors in the chip are doing analytics where all of the data resides, the insights that come off are pre-filtered, and of high information value, Baines added.

On-chip vs. off-chip
Another way to look at the opportunities for data mining in the design and verification space today is either on-chip or off-chip.

“On-chip, a whole bunch of activity is happening with signals and transactions of things that we haven’t been tracking,” said Mark Olen, a product marketing manager at Mentor, a Siemens Business. “Even if we have been tracking it, the amount of data is such that it’s too difficult to be able to manage. As far as the off-chip part, we refer to this as verification management or collaborative verification management. It’s not technically happening on the chip. It’s not signals, it’s not transactions, but it is metrics like coverage data, design changes, design stability, feedback from regression – all those things that are equally important but are off the chip.”

This is an area of interest for much of the semiconductor design ecosystem, from EDA suppliers to companies developing their own internal homegrown techniques, because nothing exists today. For on-chip, there are semiconductors designers or system designers that progressed years ago to automated testing using SystemVerilog or UVM, and have taken all kinds of training. They may be running in the thousands of computers – sometimes tens of thousands – in a regression farm and will run test after test.

“Using these automated techniques, users have a very challenging time writing the metrics that they’re going to measure, such as whether it is coverage in this case,” Olen said. “There was a big change when we went to concurrency with multiple processors on the same chip sharing layers of memory, because the human brain is no longer able to really track and process what’s happening on the chip. That’s where everyone is looking at how to automate all of this stuff.”

According to surveys done by Wilson Research Group, 50% of designs have to be respun after going into the lab. “It doesn’t matter how much verification is done and how smart people are in a SystemVerilog or UVM environment when 50% of the time they have to go back and re-spin it,” he said. “The unfortunate thing is that a lot of people basically have bowed to the economics and just plan for it in their next chip. ‘We’re going to assume it’s going to be a respin,’ and if it is an ASIC, that is $1 million. If it is an FPGA it is less, but it’s more about time and less about mask cost. Also, according to the Wilson research data, typically between 62% and 67% of all designs finish late.”

While there has been enormous advancement with SystemVerilog and constrained random testing and solvers, fast simulators, emulators, the number of respins and missed market windows is a big problem.

“As great as SystemVerilog and UVM have been been for automating block-level-design-based semiconductors, it falls short of the system level,” Olen said. “At the system level you actually might have all of the blocks that you’ve assembled or integrated into your SoC. They may function reasonably properly, but when integrated together, configured in just such a way, and a software stack is implemented and integrated, most simulators can’t really handle running a true system-level software testbench to verify the system-level hardware simulation. As a result, engineering teams look into whether they should spend the money to move into an emulator or go straight to an FPGA prototype. All of these things are a challenge. However, we are now able to collect this data that occurs more typically on an emulator because we are running at the system level. But you could just imagine when you are running routine an operating system against your design on an emulator loading drivers, maybe even running target application software, there is such a massive amount of activity that is occurring in ones and zeros across your fabric, on your buses, on your interface, and collecting all of this data is terabytes and terabytes, if not bigger. What do you do with all of that?”

The answer increasingly involves big data techniques.

Analog/mixed-signal and data mining
Data mining is coming into focus for analog/mixed-signal design, as well.

“For some years now, Cadence has been working diligently on this problem and the reason it has become a problem is two-fold,” said Steven Lewis, a marketing director at Cadence. “One, certainly from the custom design space – whether that is custom digital or even analog to some extent – as engineering teams have been going to the super advanced nodes, anything below 16nm, just the tonnage of transistors that they are looking at and the amount of calculation that needs to be done means that databases have been growing exponentially. We’ve gone from megabytes to gigabytes to hundreds of gigabytes of data that design teams start to store. The attitude of designers seems to be, ‘If I can run it, if I can save the data, I’m going to do it,’ and it’s only when they start reaching maximum that an engineer might back off a little bit and try to be more selective on what they are going to save to that database.'”

Advanced nodes have exacerbated the problem. The techniques for analyzing high-frequency RF signals are more challenging than regular analog, as well, because the speed of the signal produces more points that have to be analyzed, Lewis said.

“If you have gigahertz signals that you’re trying to find single hertz noise figures inside of, being able to go through the entire waveform database to find those instances – that’s where data mining comes into play,” he explained. “The traditional way of doing that was to take the data, load it up into a waveform window or data calculation environment and then start to process it. That worked for a while, but we have started to hit the limits on that. The biggest headaches that engineers find in the verification phase is when something isn’t going right. Maybe they are coming up against one of their steps pretty hard. Maybe they hadn’t put in quite enough buffering inside of the design to get around that. And as an engineer you are expected to go back in and fix it.”

At the verification phase, most of the circuitry is done and a lot of the layout is done. So what can the design engineer do?

“You cannot go in and make wholesale changes because you would be stuck redoing the entire design,” Lewis said. “An engineering team is going to look for the weak spot. They’re going to try to get down to the finest points they can. Is it a few transistors that are the culprits? Is it one small block or a part of a block that is the culprit? Rather than having to upset the entire design, I need to have a view into it. I may use data mining techniques. I may be asking the database to look for this condition and this condition. When both of them are happening, show me what’s happening on this one signal. Show me what’s happening on this block, on this transistor, because sometimes that will help the engineer do a surgical strike during the verification phase and fix something. Realistically, if you don’t want your schedule to go to out the window during verification, you need to be looking for the surgical strike. You cannot be doing wholesale block changing at that point. Using more sophisticated data mining techniques allows me actually to save time. Yes, I’ve got a bigger database to work on. Yes, I’ve got to put aside more memory to deal with it. Yes, I’ve got a longer simulation to do. But once I’ve got that data, this lets us slice and dice it, and now I have a realistic way of doing that.”

Whether it is RF design verification, digital design and verification, network on chip architecture, among the myriad other SoC design and verification tasks, data mining is emerging as an important tool in system-level design.

This is still an emerging area, but NetSpeed’s Mohandass believes what will make it sustainable is when the data and insight can be turned into action, and that’s when things will really pick up. “Ultimately the market for data mining will take off in the same way as the Fitbit and Apple Watch – when they figure out how to inspire action. Once you unlock data insight into action, then things will take off,” Mohandass concluded.

Related Stories
Improving Yield, Reliability With Data
Outlier detection gaining attention as way of improving test and manufacturing methodologies.
The Rising Value Of Data
Race begins to figure out what else can be done with data. But not all data is useful, and some of it is faulty.
Grappling With Manufacturing Data
Questions persist about how to deal with an explosion in data, and who has access to it, but changes are on the horizon.
Big Data On Wheels
As the market for chips in cars grows, so does the amount of sensor data that needs to be processed.

Leave a Reply