The Deep And Dark Webs

The dark side of the Web has been around for a long time, but it is about to become a much bigger player in IoE security.


From time to time we hear a snippet or two about the “other” Web – the dark side of the Internet and the Web.

For the most part, until something happens that brings the activity within that arena to the surface (such as the recent Silk Road exposure where anything was available for a price), that segment quietly hums along. But that is about to change. Once the IoE evolution gets traction, the integration of the dark side, if nothing more than simply by osmosis, will bring about some serious challenges to Web security.

Those who are familiar with the dark Web and how to navigate it will have, exponentially, more opportunity to exploit the surface Web. That’s because, as the Internet becomes the IoE, there will be billions more devices talking to billions of other devices, some autonomously. The “Web,” which is really nothing more than an information-sharing overlay that is built on top of the Internet, will have orders of magnitude more opportunity to access information across the IoE. That is just simple math. So the time has come for the security segment to start thinking of how to shield the good from the dark.

So let’s take a look at exactly what are the elements of the “Web.” We will drill down into each element, examine its metrics and eventually tie it into the IoE.

Not one, but three
If one peels back the layers of the Web, they will find there are really three different Webs:

The surface Web. This is the Web we play with on a daily basis. It is easy to access, and is the part of the Web that any search engine can access.
The deep Web. The deep Web is the rest of the Web that search engines cannot access. It is not necessarily hidden, but it cannot be discovered using typical search engines. However, it can be accessed with certain tools that are fairly common and available.
The dark Web. The dark Web is a part of deep Web that is intentionally hidden. It is virtually impossible to navigate this part of the Web without some rather intimate knowledge of what one is doing. This is the most infamous part of the Web and much nefarious activity goes on here.

The surface
The surface Web, also known as the visible Web, really only comprises between 4% and 20% (depending on whom one asks) of the total Web presence. It is the part we use day in and day out to surf. It is what is visible via link-crawling techniques, such as following linked data using the well know hyperlink coding from a homepage.

Google, Bing, Yahoo and other search engines are typically used to surf this part of the Web without any special tools or algorithms. Overall, the results one obtains from a search is simply a collection of hyperlinks that have been indexed by search engines.

The deep Web
Opposite of the surface or visible Web is, of course, the deep Web. This is a much more interesting segment of the platform. This part of the Web is comprised of pages that are there, but require extra steps to find. Within this deep Web is also the dark Web. Often the deep and dark Web are used interchangeably, but there is a difference, and the dark Web will be discussed in detail, and its ramifications on the evolving IoE.

A simple example of a site that has deep content is the Internal Revenue Service site. If you type in generic search terms such as taxes, or IRS tax forms, it takes you to the site’s landing page. There a link may be found to forms and publications, for example. But that link only has the most common of documents, so to find something specific, a keyword query is required from a search box entry on that particular site. That content is considered deep, because it isn’t generally indexed to a search engine. It is only available from the site-specific search.

The accessible part of the deep Web is no secret. It is simply layered beneath the typical search engine-tied results that most of the world uses to find information.


The dark Web
The dark Web is a bit more nefarious. Estimates are that there are 400 to 500 times the number of domains under the surface than above it.

Dark Web content is deliberately being hidden, either in plain sight or it resides in a separate, public layer of the Internet. Some of this is innocent. Some of it used for under-the-radar government spy work. And some if it is used for bad purposes.

Typically, the characteristics of the dark Web include techniques such as specific coded keywords that will reveal peripheral pages. There are also sub-domain names that are never linked to and are not accessible from any search algorithm. A typical example of that would be a blog that has been posted, but not published, or items that are published, but never referenced— “/image/camaro_black.gif,” for example, are part of the dark Web. It may actually be out there but the specific URL isn’t public so unless it is known, the blog is invisible.



Another such example is changing headers to change the page. Headers can be coded to show different version of the same page, depending upon the access method.

Virtual networks are another example of the dark Web because they require special software to access. This can be for legitimate purposes, such as the common practice of setting up access to a company Web site from a remote location to mimic on-site presence. Or it can be used to drill deep into the hidden bowels of the dark net, with tools like The Onion Router (Tor) [see reference 1] and the Invisible Internet Project (I2P) [Reference 2]. Tor can be used to touch a whole new world of content hidden from public view.

The darker side
There is the camp that says one should have the ability to traverse the Internet with complete and universal anonymity. That is what Tor, I2P, and similar applications do. In a prefect society that may work out just fine. But in the real world, networks that use these engines can be both illegal and immoral. Typical examples are illegal drug and weapons marketplaces, human trafficking and child pornography.

They also can be used to leak sensitive information, launder money, and for identity theft and credit card fraud. This is one of the more worrisome issues for the IoE.

On the not so ugly side, there are secret parts of the Web and the Internet specifically designed for law enforcement and government to be able to examine questionable Web sites and services without leaving tracks. Unfortunately, these were rapidly compromised by criminals and governments, which use them for exactly the opposite of what they were designed for.

The dark Net also has been used to counter censorship in China, for example, and as a rallying point against extremist governments. But in free world countries, there exists a large enough criminal component to warrant concern and there seems to be little of that concern, at present.

Last year a search engine called Grams was launched. It is patterned after Google, and uses the Tor anonymizing browse. Its claim to fame is that it simplifies the searches for sites selling drugs, guns, stolen credit card numbers, counterfeit cash and fake IDs. These are sites that previously could only be found by surfers who knew the exact site URL. Basically it makes specific URL site searching moot and functions like a typical search engine.

So, it is obvious that the dark Web isn’t going away anytime soon. In fact, security experts warn that it will expand. So how is it going to be dealt with? Attacks from the dark Web aren’t any different than any other vector. Malware, viruses, back doors, phishing and denial of service (DoS) attacks are tools hackers have at their disposal. The danger is that with the IoE the opportunities are orders of magnitude increased, and attacks coming from the dark Web will make protecting and neutralizing much more difficult just because of its invisibility and anonymity.

Defeating dark Web malevolence
The first and most obvious vector for keeping the dark Web at bay is to understand it. That requires thinking like a dark Web user. Perpetrators using the dark Web become invisible because of the cloaking capabilities of it. If the entity being attacked doesn’t understand that the attack is coming from the dark Web, any measures implemented to derail the attack can be misdirected or ineffective.

Some of what the dark Web holds, that can be used for attacks includes:

• Hackers for hire;
• Stolen designs, intellectual property, and counterfeits;
• Vulnerabilities, (accounts that have been hacked);
• General and specific cyber campaigns (Twitter and Facebook attacks, malvertising, etc.);
• Hacktivist (and other) targeting forums.

There are, obviously more, but these are the more common and what security teams should be aware of at a minimum.

Once a good understanding of the dark Web is acquired, there are three general processes that will establish a relatively decent firewall of protection and understanding of the dark Web.

A good first defense is to set up a dark Web mining environment with Tor, I2P or some other similar dark Web application with private browsing implemented on air-gapped [reference 3] terminals. These should be on sequestered virtual machine clusters (VMs). This is a mature technique and generally well understood in the cybersecurity arena.

Next, monitor the major areas that are likely to pose threats from the dark Web. The six major areas are, according to the Global Commission on Information Governance:

1. Mapping the Hidden Services Directory. Both TOR and I2P use a distributed hash table system to hide database information. Strategically deployed nodes could monitor and map this network.
2. Customer Data Monitoring. There will be no monitoring of consumers themselves, but rather destination requests to track down top-level rogue domains.
3. Social Site Monitoring. This includes watching over popular sites such as Pastebin to find hidden services.
4. Hidden Service Monitoring. Agencies must “snapshot” new services and sites as they appear for later analysis, since they disappear quickly.
5. Semantic Analysis. A shared database of hidden site activities and history should be built.
6. Marketplace Profiling. Sellers, buyers and intermediary agents committing illegal acts should be tracked.

The data that has been mined from the dark Web should then be analyzed in, pretty much, the same way as standard data, and create a long-term repository for ongoing analysis. The fact is that the data that comes from the dark Web isn’t any different than that of the visible Web. It is just where and how it is used and configured that changes the metric.

Finally, there are lots of oblique and sub-areas that are part of all of this; too many to cover in this article. But it is hoped that the information presented here will give the reader some guidance and direction in dealing with the dark Web.

The fact is that the dark Web is always going to be a threat. As the IoE evolves, dark Web-ers will find that it can be used to attack many of the new IoE devices that will have little or no security on board. Simply put, there will just be an elevation of what we see now, both on the surface and under it.

But the threat from the dark side is much greater, mainly because much of the Web community, businesses, and consumers aren’t aware of it. And there is increasing evidence that the dark Web will grow at an alarming rate over the next few years and become a predominant tool for hacking and other nefarious activities.

Defending against attacks from the dark Web isn’t any more difficult than with the surface Web. It just requires a new set of tools…and a new mindset.

Reference 1. The .onion domains are not part of the ICANN registry, and will not resolve until you are running Tor. Because of the way Tor routing works, both the host serving out Web pages and the requesting client are obscured and are not easily identifiable in the twilight. The combined effect leaves this form of Internet far beyond any kind of government control or regulation. All that has to be done is click the ‘new identity’ button and Tor will pick a new node to make requests through, which will seem to give you a new IP address and a completely new identity even for regular Internet searches.

Reference 2. I2P is a computer network layer that allows applications to send messages to each other pseudonymously and securely. Uses include anonymous Web surfing, chatting, blogging and file transfers. The software that implements this layer is called an I2P router and a computer running I2P is called an I2P node.

Reference 3. Air gap is the technique of isolating a network from the Internet. While never being connected to the Internet is virtually impossible, the technique is to have it on line as little as possible and use secure media to move files, when necessary, if they have been received from the Internet.