The Great Data Flood Ahead

So much data, so little time to process it.


The number of devices connected to the Internet is expected to exceed 1 trillion devices over the next decade or so. The timeline is a bit fuzzy, in part because no one is actually counting all of these devices, but the implications are pretty clear. A data deluge of biblical proportions is headed our way, and so far no one has any idea of what to do with all of it.

From a system-level standpoint, there are three critical issues, and lots of permutations of each of these issues. At the forefront is where this data will be processed, and how systems will need to be designed to handle all of this data. The idea that all of the data will be passed through giant pipes to the cloud is unworkable. Alongside of this is an understanding that most of the data generated will be useless, so the best thing to do is get rid of it as fast as possible.

This is where things get confusing, though. Data needs to be understood and tagged before it can be partitioned into good, maybe good, and trash. How to do that with streaming video as a car is racing down the highway or a robot is whirring around a factory isn’t clear. So while it’s possible to process the data quickly enough, based upon some of the new architectures under development, it’s not clear how to pinpoint what’s valuable and what isn’t. In fact, that decision may be application- and user-specific, and the weights attached to that data need to be user friendly enough to build some resilience and programmability into these systems.

This isn’t how many of these systems are being designed today, though. In fact, most of the designs are focused on moving data through a chip, not figuring out what’s important. And for the most part, there is little attention being given to security, data consistency, or what else can be done with data.

That leads to the second issue, which is how these devices will be connected to other systems. This is more complicated than just new wireless technology or a protocol stack. It’s about how and where to move all of this data, which even after processing by a trillion devices will be an ocean. Without some serious planning on the infrastructure side, this could bring the whole Internet to its knees.

One of the big problems here is that no one is thinking of the Internet as a whole because no one actually owns the Internet. That means it’s no one’s responsibility to make sure that it can handle an increase in data traffic. This is one of the reasons the edge is gaining so much momentum. Moving all of that data is expensive, and managing it is easier and faster locally. In fact, some of it doesn’t even have to move outside of memory using some of the new architectures. It can be weighted there, with the bulk of it just deleted.

This, by default, adds some security into the process, with the opportunity to add even more. While data in the cloud is heavily secured, data on the way to the cloud is not. That means the encryption needs to happen at the chip or device level, and decryption has to happen somewhere else. It’s that split second between when data is encrypted and decrypted that the data can be hacked, and this is particularly important for the data being sent back to devices because that data has already been cleaned up. It’s also easy to overlook this step as the amount of data increases.

The third issue is where to store all of this data. While data that has been cleaned requires less storage space, the data from a trillion devices still adds up to a flood. Memory is cheap, but it’s not that cheap. And it takes energy to store that data, and to retrieve it when it’s needed. There may not be enough power on the planet to power a trillion devices, which puts an enormous burden on chip designers to keep the energy footprint low enough so the overall needle doesn’t budge. That’s a huge challenge for the semiconductor industry, and that may be just the beginning. What happens when there are 2 trillion devices?

Related Stories
Big Shifts In Big Data
Why the growth of cloud and edge computing and the processing of more data will have a profound effect on semiconductor design and manufacturing.
Data Confusion At The Edge
Disparities in processors and data types will have an unpredictable impact on AI systems.
How Hardware Can Bias AI Data
Degrading sensors and other devices can skew AI data in ways that are difficult to discern.
More Semiconductor Data Moving To Cloud
A year ago many companies were unwilling to ship their data offsite. What changed?
Solving The Memory Bottleneck
Moving large amounts of data around a system is no longer the path to success. It is too slow and consumes too much power. It is time to flip the equation.
Memory Subsystems In Edge Inferencing Chips
Tradeoffs and their impact on power, heat, performance and area.


Daniel Payne says:

My favorite IoT device is a GPS-enabled bike computer that automatically uploads my cycling route, speed, heart rate and cadence to a site called using a WiFi connection. After each ride I glance at all of this wonderful analytical data, but then it stays in the cloud for a long time, with diminishing value to me.

Leave a Reply

(Note: This name will be displayed publicly)