Security impact of the Rizzo/Duong CBC "BEAST" attack

| Comments (3) | COMSEC
If you're familiar with network security and haven't been living under a rock you've probably seen the recent coverage of Rizzo and Duong's attack on SSL/TLS implementations. they've demoed the attack and information is starting to trickle out (the news articles above were written prior to release), we can begin evaluate the impact of this work. (See AGL's post on this). Unfortunately, there's no paper publicly available and the live chat during the talk raised more questions than were answered. [in large part due to the inadequacies of trying to ask questions over WebEx chat -- EKR 9/24]

First, the bottom line: Don't Panic. Yes, this is interesting work, no SSL/TLS is not completely broken. In particular, your communications with your bank are quite likely to be fine. In particular, AGL suggests that Chrome is fine.

Background: CBC Encryption

In order to understand what's going on here, you need some background. SSL/TLS can encrypt data with two kinds of ciphers: block ciphers like AES and DES and stream ciphers like RC4. You don't need to worry about stream ciphers for now. This attack only applies to block ciphers. The way that a block cipher works is that it's a keyed mapping from plaintext blocks (typically 128 bits) onto ciphertext blocks of the same size. So, it's like having a huge table with 2128 entries showing each plaintext block M and it's corresponding ciphertext block C. Each key represents a different table. So, we represent encryption as a functin C = E(Key, M) meaning that we compute the encryption function on Key and M and the result is the ciphertext.

The obvious way to use a block cipher is to break up plaintext into 128-bit blocks and encrypt each block separately (this is called electronic codebook (ECB) mode. It should be obvious that if you have two blocks that are the same in the plaintext they are also the same in the ciphertext and so you patterns in the plaintext get reflected in the ciphertext. This is bad. This Wikipedia article has a good visual comparison of just how bad it can be. In order to prevent this, other cipher modes have been developed that break up those patterns. The one used by SSL/TLS (at least prior to TLS 1.2) is called cipher-block chaining (CBC) mode. The way that CBC works is that when you encrypt block i you first XOR in the encryption of block i-1. More formally:

Ci = E(Key, Ci-1 ⊕ Mi)

Obviously, when you go to encrypt the first block, there is no previous block to XOR; in, so the standard practice is to generate a random initialization vector (IV) and use that as if it were the encryption of the previous block. The effect of all this is to break up patterns: consider the first block M0. To encrypt it you compute:

C0 = E(Key, IV ⊕ M0).

And then to encrypt M1 we do:

C1 = E(Key, C0 ⊕ M1).

Now, unless C0 happens to be the same as IV (which is very unlikely), then even if M0 = M1 the input to the two encryption functions will be different and so C0 ≠ C1, thus breaking up the pattern.

How CBC is used in SSL/TLS
The way I've described CBC above is as if you're just encrypting a single data blob (e.g., a file) consisting of a number of blocks. However, SSL/TLS is a channel encryption protocol and so it wants to encrypt not a single file but a series of records. For instance, you might use a single SSL/TLS connection for a series of HTTP requests, each of which is broken up into one or more records which might be sent over the course of seconds to minutes. All the records (in each direction) are encrypted with the same traffic key.

There are two basic ways to use CBC in this kind of environment:

  • Treat each record as if it were independent; generate a new IV and encrypt the record as described above.
  • Treat the records as if they were concatenated into a single large object and just continue the CBC state between records. This means that the IV for record n is the last block (the CBC residue) for record n-1.

SSLv3 and TLS 1.0 chose the second of these options. This seems to have been a mistake, for two reasons. First (and more trivially) it makes it hard to use TLS over any kind of datagram transport (hence DTLS) and second, it turns out that there is a security issue.

The Original Predictable IV Issue
Back in 2004, Moeller [*] observed that it was possible to exploit this behavior under certain circumstances. (the original observation of this style of attack seems to be due to Rogaway [*] and then extended to SSH by Wei Dai.). Imagine that you're an attacker who can convince one side of the SSL/TLS implementation to encrypt some data of your choice. This allows you to learn about other parts of the plaintext, even if you wouldn't ordinarily be allowed to see that plaintext.

Consider the case where we have a connection between Alice and Bob. You observe a record which you know contains Alice's password in block i, i.e., Mi is Alice's password. Say you have a guess for Alice's password: you think it might be P. Now, if you know that the next record will be encrypted with IV X, and you can inject a chosen record, you inject:

X ⊕ Ci-1 ⊕ P

When this gets encrypted, X get XORed in, with the result that the plaintext block fed to the encryption algorithm is:

Ci-1 ⊕ P

If P == Mi, then the new ciphertext block will be the same as Ci, which reveals that your guess is right.

The question then becomes how the attacker would know the next IV to be used. However, because the IV for record j is the CBC residue of record j-1 all the attacker needs to do is observe the traffic on the wire and then make sure that the data they inject is encrypted as the next record, using the previous record's CBC residue as the IV.

While troubling, this isn't that great an attack. First, the attacker needs to be able to somehow mix traffic they control with traffic they don't control and can't see, all over the same SSL/TLS connection. This isn't impossible; for instance it might happen over an SSL-VPN, but it's also not that common. Second, it only lets you guess a whole plaintext block at a time, so even if you're guessing a very low entropy value, it takes a lot of guesses to search the space.

Still, this is a serious enough issue that the IETF felt like it was worth fixing, and the TLS Working Group duly developed TLS 1.1, which changed to the first strategy (called "explicit IV" in the standard). [Technical note: the required defense is actually slightly more complicated because you need to make the TLS-using application commit to the entire plaintext block prior to revealing the IV.] TLS 1.1 was developed in 2006, but deployment has been pretty limited ([*]). We don't know why for sure, but I think the general feeling in the security community is that the threat didn't seem serious enough to motivate upgrading.

The Rizzo/Duong Attack
Rizzo and Duong's paper improves on this attack in two ways:

  1. They have developed a more efficient attack which allows the attacker to guess a single byte at a time rather than having to guess a whole block.
  2. They observe that a specific use of Web technologies (cross-origin requests and in particular Web Sockets) allows the attacker to mix traffic in the fashion described above.

Shifting the Boundary
The improvement in the attack is easy to understand. Imagine that the attacker has some control about the way that data is fitted into blocks. So, consider the case where we want to guess Alice's password, which (without loss of generality) we know is 8 characters long. If the attacker can arrange for the password to be split up in between records so that the first character is in one record with otherwise predictable contents and the next 7 characters are in the next record, then the attacker only needs to guess the first character.

For instance, if the way the username/password protocol works is that you send the string user: alice password: ******** where ******** is the password itself. So, if the attacker can arrange that this is broken up as lice password: * | *******........., then they can guess the first character of the password in isolation. Furthermore, if they know the first character, they can then shift the record boundary by one byte and then guess the next character. The way this attack plays out in practice is that the attacker exhaustively searches the first character, then fixes the first character, and searches the second, and so on.

Exploiting WebSockets
The previous best attacks involved VPNs, but Rizzo and Duong suggest a different vehicle. The basic idea is that the Web is an inherently multi-site environment and it's very common for JavaScript coming from Site A to talk to Site B (for instance, for mashups). To give just one example, if you embed an image on your Web page that comes from www.example.com, the browser makes a request to www.example.com. Importantly, this request includes any cookies you might have for www.example.com. This capability is the basis for a variety of attacks, including cross-site request forgery (CSRF), and cross-origin requests (i.e., those made by scripts from site A going to site B) are strictly limited in order to limit those attacks.

These restrictions, however, are inconvenient, and so many newer Web technologies are moving to a security model of origin-based consent. The idea here is that when a cross-origin request is made to site B from site A, the browser asks site B whether it's OK from that site, thus allowing site B to selectively allow access only to safe resources. One such technology is Web Sockets, which is designed to allow a client/server pair to start with an HTTP transaction and upgrade it to a transparent (non-HTTP channel) that allows the transmission of arbitrary messages that aren't framed as HTTP messages. The way that WebSockets works is that there is an initial HTTP handshake (including cookies) that allows the client to verify that the server is prepared to do WebSockets. The handshake looks something like this:

Client->Server:
        GET /chat HTTP/1.1
        Host: server.example.com
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
        Origin: http://example.com
        Cookie: 0123456789abcdef
        Sec-WebSocket-Protocol: chat, superchat
        Sec-WebSocket-Version: 13

Server->Client:
        HTTP/1.1 101 Switching Protocols
        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
        Sec-WebSocket-Protocol: chat

After the handshake, client-side JavaScript is allowed to send arbitrary data to the server, though it is wrapped in some framing data.

It should be obvious at this point how one might use WebSockets as a vehicle for Rizzo and Duong's attack. Say the attacker wants to recover the cookie for https://www.google.com/. He stands up a page with any origin he controls (e.g., http://www.attacker.com/. This page hosts JS that initiates a WebSockets connection to https://www.google.com/. Because WebSockets allows cross-origin requests, he can initiate a HTTPS connection to the target server if the target server allows it (e.g., because it wants to allow mash-ups). Because the URL (/chat above) is provided by the attacker, he can make it arbitrarily long and therefore put the Cookie wherever he wants it with respect to the CBC block boundary. Once he has captured the encrypted block with the cookie, he can then send arbitrary new packets via WebSockets with his appropriately constructed plaintext blocks as described above. There are a few small obstacles to do with the framing, but Rizzo and Duong claim that these can be overcome and those claims seem plausible.

The Impact of Masking
That's the idea anyway. Fortunately, I've omitted one detail: what I've just described is WebSockets draft -76. This version of WebSockets was shipped in some browsers and then largely disabled (for instance here) because of a vunerability published by David Huang, Eric Chen, Adam Barth, Collin Jackson, and myself. The version of WebSockets which the IETF is standardizing incorporates a feature called masking in which the browser generates a random 32-bit mask that gets XORed with the content of the packet prior to transmission (and hence prior to SSL/TLS encryption). The impact of this change is that if an attacker wants to use WebSockets they only have a 2-32 chance of being able to generate the right input to the encryption algorithm to mount the attack. Obviously, this isn't as good as random IVs (which increase the difficulty by a factor of 2128 for AES), but it's a pretty significant barrier nonetheless.

Note that I'm not saying that my co-authors and I knew about this attack or that we pushed for it as a countermeasure. Rather, we were concerned about a different class of attacks in which an attacker was able to control bits on the wire, and masking was intended to deny the attacker that kind of control. However, since similar levels of control are required in order to mount this attack, masking seems to be an effective countermeasure here as well.

As should be clear based on the above discussion I don't think that this is an issue with newer versions of WebSockets (which means recent versions of browsers other than Safari) and of course older browsers don't implement WebSockets at all. And even if you have a browser which is vulnerable, you need to be talking to a target site which actually accepts cross-origin WebSockets requests, which as far as I know is very rare for high-value sites such as financial sites.

Exploiting Java
The demo that Duong and Rizzo showed today used Java to provide the vector for the attack. As I understood their presentation (and note I don't have their papera copy of their paper with full details on how they're using Java but the version that is floating around says URLConnection [-- updated 9/24]) they say they don't need any heightened Java privileges. What's a little confusing here is exactly how they are getting past same-origin issues. In particular, which Java APIs they are using and whether this was expected/known Java behavior with respect to SOP or whether they had found a way around SOP was really unclear. That's important to know, in part because it dictates the right response and also because it tells us whether they've found a threat that extends past HTTPS. In particular, if their is a clear SOP violation (as for instance in this exploit) then you have a serious problem regardless of any crypto magic.

Requirements for a Successful Attack
This post is really long, but the last thing I want to cover is what conditions would be required to mount a succesful attack using this type of technique. As far as I can tell, we need to have a target domain which allows cross-origin requests that:

  1. Contain some user secret (e.g., a cookie) in a predictable location.
  2. Allow scripts from the attacker's origin to tightly control additional data on the same HTTPS connection as the user secret.

It's this mixing of data under control of the attacker and data which should be kept secret from the attacker that constitutes the threat. This is a a very natural thing to do in the Web context; mashing up data from one site with another is something that happens all the time. The Web security model is designed to protect you from that, but the lesson here (once again) is that actually getting that right is somewhat tricky.

I'm actively trying to get more details on how this attack works, so more as I get them. At the moment, my advice would be to disable Java—that would be my advice in any case—and otherwise probably don't get too excited.

Next Up: Countermeasures other than upgrading to TLS 1.1

3 Comments

Please correct: Java has absolutely nothing to do with JavaScript. In the story, JavaScript is involved, not Java.

I don't know about the story, but as I understand it, the demo (and the sample code that has leaked) is in Java.

All right, thanks.

Leave a comment