Dark Data

Without a comprehensive data management policy, improvements in technology will be meaningless.


Last month the National Resources Defense Council updated its study on data centers. The numbers themselves are rather mind boggling, and the NRDC has done an exceptional job over the years in ferreting out the biggest, baddest culprits of wasted energy. Witness the 2011 report on set-top boxes, which is finally beginning to yield changes in set-top box designs.

The data center report is equally damning. But as with the set-top box report, where there is waste there also is enormous opportunity—the kind with lots of numbers to the left of the decimal point. What’s more, this opportunity stretches from the server companies all the way across the giant ecosystem for chips, IP, design tools, software, and even out to packaging and manufacturing.

Taking advantage of that opportunity isn’t so straightforward though, or at least not for long. What is less obvious is what has to happen around all of this technology to really make it work. This is about more than just technology. It’s about how the technology is used—a methodology and hierarchy for processing, storing and accessing that data. As we get inundated with data from the Internet of Things and the Industrial Internet of Things, we have to figure out what to do with all these ones and zeroes.

Just to put this in perspective, imagine how much data will be generated by an assembly line of machinery running for eight hours, which may include thousands of sensors generating terabytes per hour. The real information that’s required, which is the aberrations, may only be a few megabytes. But multiply that times all the assembly lines, all the heartbeat monitors and automotive information from 60 million cars each year and even the distilled information is a huge number.

How much of that will be accessed, how quickly, and how securely, are big questions that need to be openly discussed and answered. The answers will determine how much data will remain dark, similar to the circuitry in an SoC on a mobile device, and how much needs to stay lit or dimly lit. In effect, we need a data management and storage policy that stretches across industries, across political borders, and one that can be indexed over time.

Without that, innovations in technology and cool gadgetry will lead to frustration, which is the reaction of many consumers of new technology today. Devices don’t always work, they frequently don’t work together, and they rarely work at speeds that would make them attractive to consumers. And this problem will only get worse as the amount of data exceeds the infrastructure’s ability to handle it effectively, quickly, and without causing a global brownout.

It’s certainly important to be able to process data more quickly and efficiently. That alone will save enormous amounts of money and energy. But we also need a way of figuring out what data is important, what data will be accessed again, and how fast any of that needs to be done. Technology will only take us so far in that regard. The rest will require a sophisticated and innovative data management policy, and so far there is none.

Leave a Reply

(Note: This name will be displayed publicly)