Heartbleed And The Internet Of Things

If you think Internet security is complicated, just wait until the IoT gets rolling.

popularity

Heartbleed is not a country and western song, but many wish it were. It’s a programming glitch with the potential to cause disastrous and widespread compromises on seemingly secure data.

By some estimates, the flaw in the heartbleed code has allowed hackers to collect personal data, including passwords, undetected, for as long as two years. Exactly how much data has been breached, and what the total damage will be, is still under assessment, but the media hype suggests it is substantial. Moreover, one has to wonder if this glitch may be connected to the recent data compromises at Target and other organizations. Fortunately, the fix is out, but it may take a while for everyone to apply it to their systems.

What makes this “bug,” for lack of a better term, so dangerous is that it is not some super-complex, self-morphing, Mensa-level, mega virus. In fact, it is not really a virus or bug at all. It simply exploits a somewhat overlooked programming mistake in the “heartbleed” part of certain versions of OpenSSL.

In this case the code vulnerability allows anyone on the Internet to read the memory of the systems running vulnerable versions of the OpenSSL software. The fix, according to Dmitry Bestuzhev, head of the research center, Kaspersky Lab Latin America, is quite simple and is included in the OpenSSL 1.0.1g version.

“However,” says Bestuzhev, “if some enterprises cannot upgrade to the patched version of OpenSSL, system administrators can recompile libraries and binaries of compromised versions of OpenSSL using the key -DOPENSSL_NO_HEARTBEATS. Either method will fix the problem for now.”

Extrapolating this to future intelligent objects, which will use the same Internet protocols and platform as today’s hardware, means the same vulnerabilities will exist for them as well. Because the IoT will be have orders of magnitude more objects and vastly varying levels of intelligence, coding mistakes that allow access to memory locations and permit alteration of read/write memory locations code are particularly dangerous.

Unfortunately, because no programming is perfect, they are always likely to be there, in one form or another, according to Bestuzhev. And there is really no specific neutralization strategy, such as the ones that can be developed against viruses, for example. Generally, these types of coding anomalies show up in the field after the code, program, OS, whatever, has gone public, because it isn’t the type of error or errant code that raises a flag during compiling or testing.

There are methods to mitigate these during development, however. According to Bestuzhev, when asked how such coding errors can be minimized, especially when countless new object inherit some level of intelligence he replied, “The key is in pentesting and auditing. However, these processes take time and human resources, and they always have a cost associated with them, and that plays into bottom line…It’s not enough to count on good will. Resources need to be allocated to funds and the necessary staff to get the job done. When this is realized, and implemented, we may have fewer problems like Heartbleed. However, it is important to note ‘fewer’ doesn’t mean ‘not at all.’”

Drilling Down a Bit – OpenSSL 101
SSL stands for Secure Socket Layer, which is the layer that handles encryption and authentication for the servers that run UNIX. The “open” part refers to freely available, unrestricted access UNIX source code, on which the majority of servers around the world run. This is something that is very common in the UNIX world, where almost all code and projects are freely available for anyone to see — programmers and hackers alike. Therefore, it is easy for someone with an understanding of UNIX OSes to see what the code does.

OpenSSL is an enormously popular method of keeping personal information private on the Internet. Millions, of Web sites use OpenSSL to protect your username, password, credit card information, and other private data. However, tests have shown one can access this data completely anonymously with no sign it was ever accessed. Somewhere along the line that should have been a wakeup call, but obviously, it just slipped by, under the radar, until it was exploited.

The term heartbleed, in OpenSSL is the moniker for the handshake that occurs when two servers prepare to make a secure connection (one could say that it seems a bit of a conundrum, at this juncture). It is also the verification process that the two computers use to make sure they are still online with each other. The process for verification goes something like this:

Every time data is exchanged between the host and client, a “heartbleed” routine is set up. Part of it is that, prior to the transmission, a verification check is made to make sure the server is listening, and the client is valid. If the verification is returned, the heartbleed data is sent. This process repeats until all of the transaction data is sent; the transaction is complete, and the connection terminated. However, if, during a transaction, one of the computers gets shut down, blows up, an earthquake happens, or some other crisis causes the transmission to be interrupted, the heartbleed goes out of sync, and the other computer is instructed to terminate the transaction. This is to prevent open connections from staying online, vulnerable, when the transaction fails. In reality, the process is quite simple and has been an accepted practice for years, millions of times a day across millions of computers worldwide.

Code Talk
The actual Linux code that starts the heartbleed, pulls the data, transmits it and verifies the transaction, is not necessary for this quick discussion, but if you are interested in it, go to this URL:
http://blog.existentialize.com/diagnosis-of-the-openssl-heartbleed-bug.html/

As it turns out, the code that is at the root of all of this is short and simple. It is: memcpy(bp, pl, payload); The line of code is legitimate and, if left unmolested, does exactly what it is supposed to do, and accurately. The problem is that the code can be “fooled” by altering the payload data. As it turns out, the pb location is one where sensitive data is housed, by programming design.

Defining the variables

  • “memcpy” is an instruction that tells the computer to copy data from one memory location to another
  • “pb” is a location on the serving computer where the client data is going to be copied.
  • “pl” is the data from the client computer that is sent as part of the heartbleed transaction.
  • “payload” is a number that defines the size of pl.

While the general concept is that unused computer memory is empty. In reality it generally isn’t. Once the computer is up and running memory, read/write is an ongoing event. And, generally, the memory is always full of data. It may be old data, such as personal data from a previous transaction. It also can be partial unintelligible data because it has had some of it overwritten, or just some random data that is totally disconnected. But it is data, nonetheless. Why this works this way is because it is not efficient for computers to constantly wipe memory when they are done with. Rather, they simply set a flag that tells the OS what is currently in this memory location is old data, and it is okay to overwrite. But until the computer uses that memory, whatever is in there stays. And, in this case, it becomes the target data of hackers.

That being said, once a heartbleed transaction is set up, data taken from the client side pl is copied into the host location pb. Payload says that the data block is XX bytes, the size of the pl data block and sets up a reserved space on the host, so the exact same memory block size is reserved as pb. As the code executes, it takes the data in the client pl location and copies it to host pb location. Then it returns the data from the host to the client as part of the heartbleed transaction.

This is where the hacker gets access to sensitive data. The hacker initiates a transaction with a host computer. It may well be a legitimate login, or it may be a hacked login. Either way, the results are the same. For example, the hacker may set the pl data to zero, and set the payload number to, say 64 kb (it can be any value within limits). Then the transaction is initiated and instead of the pl data overriding the pb data bit for bit (remember, in a legitimate transaction, the pb and pl blocks are identical in size), nothing is overwritten on the host side and the returned data is what was in the host pb memory block, originally.

In some cases, as was discussed earlier, it may just be garbage. In other cases it might be the previous user’s data, including things like passwords or credit card data. So if an organized attack is devised, in reality the transgressor could mine millions of bits of left over data, and even though much of it might be gibberish, some of it will be valuable.

Therefore, by extrapolation, these and similar types of flaws can be passed to  (IoT) object coding as well. To avert this, and, as the Internet evolves, the next generation of internet objects will have to have both much tighter coding awareness and higher level of autonomous firewalls.

On to the Internet of Things
Taking a look at how coding can affect intelligent objects of the emerging IoT presents some interesting challenges. The main difference between objects on the Internet of information vs. the Internet of things is that most objects today are human-interactive devices. Managing them, in whatever fashion, is done via human control – some is constant, some is periodic, but the point is that today, most devices are monitored by humans, most of the time. We make them do what we want, and if there is a security breach, we deal with it with human intelligence. That is not to say that these security breaches can’t get away from us, but sooner or later we are going to find them.

The Internet of things is envisioned as a network of interconnected objects. Everything from office supplies to private jets will have an online presence. Some will simply report and respond on small cell networks (picocells in the home, for example). Others will have complex, two-way reciprocal communications via the Internet.

The level of sophistication of these objects will vary widely. The simpler ones, such as door and window NC/NO alarm contacts may simply report a state and require nothing more than a simple low-bit controller. On the other hand, as the level of device sophistication increases, complex objects such automobiles, will have sophisticated MCUs or MPUs that rival those of powerful multicore high-end smart devices and computer processors. With this extremely wide girth of objects and their same wide girth of applications, managing the security of them will present what seems like almost an insurmountable plateau of challenges.

Conclusion
Bestuzhev had the same perspective on future directions as many Internet security experts have. He said that going forward “You can’t trust anything, even on trusted connections, since everything is potentially vulnerable.” That speaks volumes about the challenges that the industry faces as the Internet morphs.

He goes on to say that “all code, even open source, must be audited.”

“Sometimes the cost of an attack may be relatively very low, yet the impact very high, such as in heartbleed.” Even though the end-point are the weakest stage, one has to address all of the layers that have the potential to be exploited, and data compromised – on any platform,” Bestuzhev said.

Eventually, as more objects begin to integrate intelligence, and the future IoT starts to take shape, this and the countless other existing and potential vulnerabilities will create a security model that is vastly more complex than that of the existing Internet of information. And this is just the beginning.



4 comments

Robert Cragie says:

Some comments:

“In this case the code vulnerability allows anyone on the Internet to read the memory of the systems running vulnerable versions of the OpenSSL software.”

It is not quite as simple as being able to “read the memory of the systems”. It would be better to say it allows unauthorized reading of part of system memory which could not normally be read. You do sort of explain this later.

“Extrapolating this to future intelligent objects, which will use the same Internet protocols and platform as today’s hardware, means the same vulnerabilities will exist for them as well. Because the IoT will be have orders of magnitude more objects and vastly varying levels of intelligence, coding mistakes that allow access to memory locations and permit alteration of read/write memory locations code are particularly dangerous.”

They may use the same Internet protocols and platforms but more constrained devices are unlikely to use exactly the same platforms simply because the code is too big. Also, many systems lock down which cipher suites can be used and will simply refuse to negotiate early, deprecated cipher suites. The code to support them is simply not there, thus reducing the amount of overall code. This is not really an option on the Web today, although even that is changing as systems migrate to more secure cipher suites and deprecate old ones.

“SSL stands for Secure Socket Layer, which is the layer that handles encryption and authentication for the servers that run UNIX.”

SSL/TLS is the protocol for all servers which serve HTTPS data (the green padlock). OpenSSL is the most common library used to provide this service for UNIX servers but it can be used on any type of server. Other SSL/TLS libraries are of course available, both proprietary and open source. Using a proprietary library does not guarantee that the code is any less likely to have bugs. Indeed, one could argue that as there are more eyes on open source, bugs get found quicker. Anyone is at liberty to analyze open source code and it could be argued that if people have real concern for security, they should either put a team on analyzing the code or source alternatives from a provider who will guarantee the code, would would (if it exists at all) be correspondingly expensive to purchase. This is the conclusion reached by Bestuzhev.

“The “open” part refers to freely available, unrestricted access UNIX source code.”

There is no such thing as “UNIX source code”. The code is written in C, which can be compiled for any operating system.

“This is something that is very common in the UNIX world, where almost all code and projects are freely available for anyone to see — programmers and hackers alike. Therefore, it is easy for someone with an understanding of UNIX OSes to see what the code does.”

This is a double-edged sword as stated above; it also allows bugs to be picked up earlier.

“OpenSSL is an enormously popular method of keeping personal information private on the Internet. Millions, of Web sites use OpenSSL to protect your username, password, credit card information, and other private data.”

SSL/TLS is only used to secure the data transferred between client and server. It is NOT used to secure the data on the server. This should be made clear. The heartbleed bug did enable access to parts of memory which transiently could have contained this information but that is not the same as saying OpenSSL secures the data.

“The term heartbleed, in OpenSSL is the moniker for the handshake that occurs when two servers prepare to make a secure connection (one could say that it seems a bit of a conundrum, at this juncture). It is also the verification process that the two computers use to make sure they are still online with each other.”

The term is “heartbeat” and is described in RFC 6520. “Heartbleed” is the ironic name given to the exploit!. The initial handshake in the Hello exchange is capability exchange and not an actual heartbeat exchange. This only occurs after authentication has taken place. So no conundrum.

“Every time data is exchanged between the host and client, a “heartbleed” routine is set up. Part of it is that, prior to the transmission, a verification check is made to make sure the server is listening, and the client is valid. If the verification is returned, the heartbleed data is sent. This process repeats until all of the transaction data is sent; the transaction is complete, and the connection terminated. However, if, during a transaction, one of the computers gets shut down, blows up, an earthquake happens, or some other crisis causes the transmission to be interrupted, the heartbleed goes out of sync, and the other computer is instructed to terminate the transaction. This is to prevent open connections from staying online, vulnerable, when the transaction fails. In reality, the process is quite simple and has been an accepted practice for years, millions of times a day across millions of computers worldwide.”

The heartbeat is simply a keep-alive exchange performed periodically. If either side goes away, the heartbeat exchange fails and the connection is closed down.

“The problem is that the code can be “fooled” by altering the payload data.”

The fooling is actually done by setting the length set in the packet inconsistently (i.e. much larger, up to 64K) with the actual length of the buffer of data to be echoed back. The bug was that this inconsistency was not checked. If it had been checked, the heartbeat would have worked as expected and the patch indeed does that check and discards the packet if it is inconsistent. This leads to the code blindly accepting that length as the “payload” parameter in your analysis and thus reading beyond the buffer sent. This is how the data leaks – a good ol’ fashioned buffer overflow. As you rightly say, the additional data read back may be just garbage but of course numerous transactions will reveal potentially more and more data and keys can normally be easily spotted as they are by definition random data.

“He goes on to say that “all code, even open source, must be audited.””

Absolutely – this is the only proactive way of dealing with this. And yes, it will be expensive. In the IoT, the key is KISS. Don’t put in code you don’t need. Craft the code to be small. Then it comes down to a number-of-lines-of-code issue. The probability of a bug in 1000 lines of code will be 50% of that in 2000 lines of code. However, I am talking about “functional” lines of code here; sometimes it is necessary to increase code for defensive programming, exactly what was left out of the OpenSSL heartbeat code. Coding in patterns helps here as secure techniques and frameworks can be applied around functional code. The key is still to reduce the amount of functional code.

Robert Cragie says:

I noticed a possible misinterpretation when reading back:

The fooling is actually done by setting the length set in the packet inconsistently (i.e. much larger, up to 64K) with the actual length of the buffer of data to be echoed back.

What I should have said was:

The fooling is actually done by setting the length in the packet inconsistent (i.e. much larger, up to 64K) with the actual length of the buffer of data to be echoed back.

Apologies for hasty posting.

Houssam says:

I believe the routine is called “hearbeat”, not “heartbleed”. See “Why is it called the heartbleed bug” at http://heartbleed.com/. (and Bestuzhev’s fix of setting DOPENSSL_NO_HEARTBEATS). Heartbleed is specifically the vulnerability’s name.

URL says:

… [Trackback]

[…] Informations on that Topic: semiengineering.com/heartbleed-and-the-internet-of-things/ […]

Leave a Reply


(Note: This name will be displayed publicly)