The Rising Value Of Data

Race begins to figure out what else can be done with data. But not all data is useful, and some of it is faulty.

popularity

The volume of data being generated by a spectrum of devices continues to skyrocket. Now the question is what can be done with that data.

By Cisco’s estimates, traffic on the Internet will be 3.3 zetabytes per year by 2021, up from 1.2 zetabytes in 2016. And if that isn’t enough, the flow of data isn’t consistent. Traffic on the busiest 60-minute period in a day increased 51% in 2016, compared with a 32% growth in overall traffic.


Fig. 1: Historical and projected growth of data. Source: Cisco Visual Networking Index

That’s only part of the picture, too. No one really knows how much data is really being generated because not all of it ends up on the Internet.

But the real issue isn’t the amount of data. It’s how much of that data is useful or valuable, and so far there are no clear answers. It requires sifting through huge quantities of both digital and analog data and enough context to understand the true value. This is like panning for gold across millions of riverbeds that have been mostly picked dry. But with enough compute horsepower and massively parallel tools for sifting through that data—as well as a better perspective for how to apply that data—it still can create some very lucrative business models.

“A lot of industries have figured out that their business, product, and business models could be impacted by a different utilization of the data that is somehow attached to their devices or their business models,” said Aart de Geus, chairman and co-CEO of Synopsys. “If you can harness that in a way that finds shortcuts and efficiencies, or just completely different ways of going about business, that is high impact.”

It’s also potentially high profits. “You see all the people in the processing world trying to listen very carefully what will be the needs, or trying to predict themselves what the needs are,” said de Geus. “Or even one step further, they’re trying to be on the path of the data so they’re closer to where the money ultimately is made.”

This is what is driving the stampede of investments for everything from data mining and cloud-based services to machine learning and industrial IoT.

“Those who own the data, the analytics, and the ability to process the data make all the money,” said Wally Rhines, president and CEO of Mentor, a Siemens Business.

It’s uncertain if it’s a winner-take-all game, but there are certainly some big companies vying for leadership in this space—Amazon, Google, Microsoft, Facebook and IBM, to name a handful.

“With IoT data, you collect a lot about performance, behavior and usage of a device,” said Christophe Begue, IBM‘s sales leader for the Americas. “What we do next is give it to Watson to do analytics with the data. But that data really only makes sense in the context of larger amounts of external or context data that you’re not collecting directly with these devices. So from trucks you can determine traffic patterns, and you can use those to understand people behavior and social behavior.”

The big question now is how this data can be monetized, and what people are willing to pay for. There are several issues that need be addressed to make that work, however. First, companies need to understand the real value of data. Second, they need to be structured in a way that they can react quickly to changes in data. While stock traders can profit from information that is hundredths of a second ahead of the rest of the market, it might take days or even weeks for big companies to react to shifts. And third, there needs to be a consistent and competitive way to price that data.

IBM is looking at turning data about a global supply chain into a business. “There are two levels of this,” said Begue. “The first is retail and CPG (consumer packaged goods), where sale of drinks or food tomorrow is likely impacted by some local events. You can collect this information based on what is going on in the vicinity of a specific store, such as weather, traffic or sports events, and you can track it with traffic patterns. We do that with Metro Pulse, which takes into account 500 data elements. That can either be a service using machine learning analytics, or they can contract IBM to do that for them. The second area involves supplier risk, which we are now bringing to market. We look at the supply chain risk, which could include weather, political disruptions, and use it to design a more resilient supply chain. If you know 15 things are at risk, then you need to watch them very closely.”

Rather than just analyzing existing data, this service makes recommendations and provides insights. “What we’re doing is collecting public and semi-public data, and some data which may not be available outside of IBM, and we’re building a forecasting model. We realize there is still a gap between plan and reaction, and the way you help people resolve this faster is with the concept of a ‘resolution room.'”

Smarter manufacturing
Still, not all data needs to come from external sources to be useful. Data that is internally generated is particularly valuable for an industrial operation. In fact, the whole concept of smart manufacturing—alternately known in Germany as Industry 4.0, or elsewhere as industrial IoT (IIoT)—is built on better utilization of internal data.

“IIoT is all about improving the factory,” said David Park, vice president of marketing at Optimal+. “Right now these companies have process analytics and just-in-time manufacturing, but what they need is predictive analytics. That has benefits for the factory, but the main beneficiary there is the brand owner. The brand owner and the factory are not necessarily the same thing.”

The problem is that not all data is good, and decisions based upon bad data can lead to unexpected problems.

“If the data is good, you can improve yield by 2% to 3%, which is significant,” said Park. “You also can collect data from every part that is tested in the supply chain and for any time period you want. So if you have scratched wafers, you can trace back where those scratched wafers came from. You also can see how devices age in the field. If there is preventive maintenance for a fleet of vehicles, you can see how that performs one or two years down the road. This even works in finance, where you get hundreds of thousands of invoices and you can’t correlate that every invoice is correct.”

This kind of data analysis is particularly important in a complex supply chain. While semiconductor manufacturing itself is quite sophisticated in its use of data, that’s not the case across the rest of the supply chain.

“Using data effectively is one of the big topics for (SEMI’s) Smart Manufacturing Advisory Council,” said Tom Salmon, vice president of collaborative technology platforms at SEMI. “It’s important to have data, and the problem is not that we don’t have enough data because we only use about 10% of that data. The real problem involves what questions we should be asking and how to apply that to goals we’re setting for manufacturing. So there may be a reliability issue, but it isn’t necessarily a process issue.”

Machine learning

That is the basis for machine learning, which seeks to cull critical data and to have machines extrapolate from that data within a set of pre-defined parameters. This approach already is being used in the automotive market, where systems are being created to assist and ultimately take over the driving in real-world conditions. Those decisions need to be put into context based upon multiple possible outcomes.

Machine learning is being used in semiconductor design and manufacturing, as well, as a way of improving quality, reliability and yield.

“If you build your data with the right level of granularity, you can apply it to future designs,” said Mike Gianfagna, vice president of marketing at eSilicon. “It’s all about how to adapt machine learning algorithms in a way that is applicable to a new problem. We’ve been building a knowledge base for the past seven years. We know how to harvest it and mine it. But if you have a massive amount of data, how do you quantize it? If the detail is too fine, you get lost in the data. If it’s too coarse, there isn’t enough value.”

Gianfagna said the goal is being able to monetize data while decreasing risk and increasing the efficiency of operations. “To do that you really need to take a holistic view of big data analytics problems.”

For semiconductor design and test, the amount of data generated is significantly smaller than some of the big data analyses being done by the large cloud operators. However, it may be more complex.

“The current state of the art is to acquire data,” said George Zafiropoulos, vice president of solutions marketing at National Instruments. “The next phase will be to figure out what else data analytics allows you to do. Can you find something useful that you weren’t specifically looking for? What you’re looking for is trends and correlations. Machine learning can be used in every discipline. If software can say that on Thursdays there is lower volume on a production test floor, why is that? There also may be a correlation between temperature at a certain point and voltage at a certain point and its effect on performance.”

Zafiropoulos pointed to better chip designs as the likely outcome. “As engineers, we guard band around designs, but if you guard band evertheying you stack up inefficiencies. If you can reduce the guard banding and still have certainty about reliability and performance, that’s a huge value. A lot of big data analytics are for massive data points. You may have 10,000 sensors in a city, which produce a huge amount of data, or you might be looking at all of Amazon’s transactions. Semiconductor data fits between what a human can tackle and these monumental data sets.”

System data, meanwhile, can be orders of magnitude larger, particularly when it involves multi-physics simulations. “We see 7nm as the first time where we need capacity, speed and machine learning and big data analytics,” said John Lee, general manager and vice president at ANSYS. “You need to do simultaneous thermal analysis. Thermal affects reliability of a system. If you go beyond existing technologies you need to leverage new horsepower. That’s where we’re seeing big data techniques. The latest GPUs have 21 billion transistors, and those are being put into cars. Those chips heat up, and as they do they create stress on the boards and cause warpage. But these things need to last 10 years.”

Conclusion
The semiconductor industry sits squarely in the middle of big data analysis. On one hand, it generates and increasingly analyzes large quantities of data for improving the performance, efficiency and reliability of chips. At the same time, it also develops the technology that makes crunching of all of this data possible.

This provides huge potential for new growth all the way around. According to Lip-Bu Tan, president and CEO of Cadence, the connected car market is expected to hit $37 billion by 2020, up from $24 billion in 2015. Deep learning will reach $10 billion by then (up from $0.6 billion), and cloud and data centers will reach $80 billion (up from $65 billion). “Those will drive our semiconductor opportunity,” he said. “From optimized IoT to the cloud will provide great opportunities for semiconductors.”

The question now is what else can be done with this data and how else can it be applied. That will likely fuel a whole new wave of experimentation and investment that will spur the semiconductor to new and unprecedented growth levels.

Related Stories
Grappling With Manufacturing Data
Questions persist about how to deal with an explosion in data, and who has access to it, but changes are on the horizon.
Big Data On Wheels
As the market for chips in cars grows, so does the amount of sensor data that needs to be processed.
Prioritizing Vehicle Data Traffic
Challenges grow for classifying and tagging huge volumes of data from connected cars.
How Good Is Your Data? (Blog)
As machines begin training and talking to other machines, the question takes on new meaning.