Data Vs. Physics

How and where data gets scrubbed will have significant consequences.


The surge of data from nearly ubiquitous arrays of sensors is changing the dynamics of where and how that data is processed. There is simply too much data to send everything to a centralized processing facility in the cloud, and even 5G won’t provide enough bandwidth to handle all of this data.

This has big implications on a much broader scale. Data is valuable. And while clean data is more valuable than dirty data (the mass of raw data collected by sensors), there is hidden value in those large masses of data. They can show broad trends and patterns that are not obvious in clean data, which is the modern equivalent of not seeing the forest for the trees.

In the corporate world, this kind of data can prove to be incredibly important, and there are examples where this has played out time and again. Xerox PARC couldn’t see the value in a graphical user interface, which allowed Apple to democratize computing. Microsoft ignored Internet search, allowing Google to grow into a global giant that ultimately threatened its own base. And Intel missed the mobile phone revolution because it thought a phone was just a phone, allowing Arm to win over the highest-volume chip segment in history. It’s not clear where the next corporate giants will come from, but these broad data trends can help identify them early enough to prevent their growth.

This hasn’t been lost on the current data giants—Amazon, Google, Facebook, Alibaba, Apple, Microsoft, Baidu—which are striving to control the data center as well as the data collection devices. That data can provide information about everything from driving habits, time spent in cars, to what people buy and when, which has unnerved privacy experts. But the real value is in the broader statistics, because they allow companies to guard against blind spots and to maintain their hegemony.

Ironically, this is where technology is beginning to show signs of strain. There simply is too much data to send over existing infrastructure. Google’s initial plan to blanket urban areas with fiber was a recognition of this trend. The company has since altered its plans, betting instead on wireless. But even that may not be enough. Data still needs to be cleaned, and so far it’s not clear how best to clean that data and still retrieve enough trending patterns to be useful.

The problem is partly infrastructure, but it’s also the classic PPA (power, performance, area) equation for end devices. No one wants to buy a handset where the battery only lasts five hours because it’s sending too much data to the cloud, and no one is willing to spend more for a device that can move data faster with a bigger battery unless they have a compelling reason. Given the slowing upgrade cycles on phones and computers, it isn’t obvious what that reason would be.

On top of that, most people are rather choosy about what they will send to the cloud versus keep on local storage. This is particularly true for businesses, which are concerned that data in the cloud can be intercepted in transit or while it is stored offsite in a managed facility. Even Equifax, whose sole mission was to safeguard private financial data, couldn’t keep a secret. Consumers share similar concerns. Almost everyone has been hacked at some point, whether it’s their credit card or hotel chain information, or their personal identification numbers.

The key question now is who gets to control the information pruning, and just how effective devices like Alexa and Google Home are at balancing consumer demands with their own need to sift through massive amounts of data. Alongside of that, there is a growing concern about privacy, particularly in places like the European Union.

But in the end, physics may be the real limiting factor. There is only so much data that can move through a pipe or be economically stored. As more data sources are added, even the largest pipes will get clogged, and storage facilities will require regular purges and servicing and expensive power and cooling. There are limits to how much even the largest companies can afford to pay for competitive data, and ultimately that may hinge on the flow of electrons.

Leave a Reply

(Note: This name will be displayed publicly)