Big Data is off and running. Making it secure is another matter.
Having the potential to collect massive amounts of data from a variety of sources is the latest tool for trend spotting, predictive modeling, and forecasting of information. Information is power and big data promises to provide substantial, significant data that can be used by all tiers of businesses in the development of any number of new industrial and commercial strategies.
For retailers it can be feedback on what is hot, when, why, who is buying whatever the buzz is, and many other metrics that will allow them to target customers. For OEMs, B2B, and other tiered vendors, it can be used to paint a picture of what is going on in the industry, where to focus R&D, timing of products, who else is doing what, where and why.
Big Data analytics is the new edge-of-the-envelope data mining. There are a number of approaches to how big data can be analyzed. Hadoop is perhaps the most visible, but other big players with a stake in it include, HP, Microsoft, and IBM. Smaller, more niche players include Infobright, Kognitio, ParAccel and Teradata. And the fact that this is an emerging platform with little consistency among analysis packages opens it up to any number of security vulnerabilities.
Today, data can be processed wherever resources are available. That can be locally, the cloud, or the fog. Such platforms create a diverse setting that supports, and enables a massive parallel computation environment. That means that there is a burgeoning opportunity for exploits, with plenty of attack surfaces available. Such an environment makes it extremely difficult to implement and maintain security, consistently, across such a highly distributed cluster of heterogeneous platforms.
One of the underlying issues is that there are some fundamental architectural security obstacles that are inherent to big data. While they are not specific to the big data platform, they are inherent to any big data project. Why? Because the only effective way to process large volumes of data is to use a distributed computing architecture. And most distributed mathematical models being applied use an environment made up of simple programming models, and open framework architectures.
In trying to add security to the big data environment, one problem is that security capabilities need to scale with the data. That is a real challenge for “bolt-on” security platforms because most cannot scale sufficiently. Some can scale to address the control points, but the clusters used in big data analysis remain largely unprotected. This is a relatively complex discussion that will be covered in future articles.
Another challenge is that such data is fluid. There are likely to be multiple copies of data active at the same time, and much of it will be shared. That makes it difficult to know precisely where any given data may be at any given time with massive parallel architecture computing environments. Big data often is replicated at multiple locations and stratified across multiple systems for analysis.
There are other challenges, as well. The dynamic operations that are needed to fully realize the capabilities of big data are difficult to rein in. There are so many new paradigms with big data that we have just begun to realize all the challenges.
It really doesn’t matter who the players are. They all face the same issues. “Overall big data is a challenge, especially with security,” says Steve Grobman, Intel Security CTO.
PFP Cybersecurity CEO Steven Chen agrees: “One of the issues we see as significant is that people assume the big data infrastructure is secure. At this stage of the development it is not a good idea to assume anything is secure.”
By default, the “big” platform of big data presents a variety of security issues at a number of levels. Because there is so much data that needs to be sifted through, this vector represents the biggest problem, both from a management perspective and a security perspective. The sheer volume, and the fact that it is unstructured, make it challenging.
“Using big data effectively means you have to look at it in many differ ways,” notes Grobman. “If the data contains sensitive information, the question becomes one of getting a balance between making the data usable and not leaking that sensitive information.”
That means the security segment of big data has to be approached from a different angle than typical, structured data. But even that has another twist, because there is pressure to analyze data faster, cheaper, with less power. That requires it to be looked at on an architectural level—both from the hardware perspective, and from the perspective of more efficient algorithms.
“There are many ways that people are constantly trying to compromise keys, so that becomes a top priority,” said Emerson Hsiao, senior vice president of sales and technical service for Andes Technology’s North American operations. “We focus on areas such as securing keys.”
Big vs. traditional data
With big data, that basic premise isn’t that much different than trying to protect regular data. However, the complexity comes with the algorithms, applications, and IP. Because big data requires a new set of metrics in that vein, new methods of protecting keys have to be developed.
“There are many more opportunities for a hacker to get into the data, with big data,” said Hsia. “That is because big data has so many connection points.”
Some may be in the cloud, others in the fog, still others on local servers.
“There has been a lot of discussion around how to secure such data when it is resting on a hard drive versus when it is actively being used in memory,” said Steven Woo, vice president of enterprise solutions technology at Rambus.
Somehow, all of these potential vulnerabilities have to be identified and secured. The security techniques applied are often relative to the circumstances. Nevertheless, the hardware that handles the data should always be secured, and keys are a tried and true method for doing that, at any level.
As Cisco CEO John Chambers said, “There are two kinds of companies: Those that have been hacked and those who don’t yet know they have been hacked.”
Big data and the cloud
At the cloud level, companies are taking security very seriously.
“In many ways, big data security can be equated to cloud security,” said Simon Blake-Wilson, of Rambus’ Cryptography Research division. “One of the things that is at the heart of many of the cloud solutions and big data networks is the hardware security module (HSM).”
HSM’s are a rather elegant solution to virtual networks that have data scattered all over the cloud. They work well for big data networks. as well, whether they are localized or cloud-based because the models are similar in many circumstances.
“As it turns out, many of the security problems relative to cloud devices really haven’t been considered in many of these back-end infrastructures. That leads to some very strange phenomenon,” said Blake-Wilson.
In the past, the view of the enterprise network has been one of a perimeter-bounded, physically isolated entity that is specific to a particular enterprise. However, that model is changing. Of late, the data segment has come to realize that the vision of the hard perimeter with a “soft” underbelly no longer exists. That is especially true with the evolution of uniform communications (UC), and subsets like bring your own device/technology (BYOD/T). Such elements make the securable perimeter a thing of the past.
The cloud, big data, UC, virtual networks, and peripheral platforms have brought to light that security must now be moved to the network core, and remote elements, wherever those are.
One of the developments of security technology that has been deployed lately, to address the issues of fluid data, apps, and services is the HSM. For a while, it was thought of as the “golden child” of network security, and it worked extremely well for physical networks. Unfortunately, that blush came off the rose eventually as things went virtual.
HSMs are really just black boxes that contain and protect sensitive keys. They integrate a dedicated crypto processor whose job it mainly to protect the crypto key. They manage processing, and the storing of cryptographic keys inside of a hardened, tamper-resistant device. They can be in the form of a plug-in card or an add-on black box at the server.
It is managed separately from the server so any compromise of the server will not allow the HSM to be compromised. HSMs provide services such as secure management of private keys; Hardware-based, crypto operations (RNGs, digital signatures, key generation); Hardware protection of for private keys, via asymmetric, secure cryptographic operations; and they offload the processor-intensive cryptographic operations. They can also provide services such as hashing and message authentication.
HSMs are usually protected by a multi-layered hardware platform and generally use software tokens for additional security.
As the cloud became the prevailing element for data storage, HSMs are evolving as well. There is a move to offer cloud HSM services. Amazon is actually at the front of that movement, with Microsoft not far behind.
Essentially, cloud HSMs offer the same type of platform their hardware brethren offer. The deployment isn’t all that new. It will utilize a VPN to gain access to it, wherever it may be. You use the VPN, or other private portals to upload and store your keys. The service provider will not have access to data or keys, or be able to access the tunnel.
That seems like a really good approach, at least for the key angle. We’ll have to wait and see how this will work out, especially since this is just in its infancy.
Big data is an enormous issue. Many organizations that are trying to use big data efficiently, are finding it complex, convoluted, and expensive. On top of that, few, if any, have any type of a grasp on just what it takes to make security a “first-class design parameter” as Woo comments. One thing that is true, as Woo notes, “connectivity is so pervasive now, that one can connect to just about any place in the world.”
Security, not just with big data, but with all data everywhere has to become a high-priority requirement that is part of the architectural design phase. Big data has some unique challenges, but as time goes on they will be addressed. The questions are how quickly, how effectively, and whether the weakest links can be addressed.