If you're familiar with network security and haven't been living under
a rock you've probably seen
the
recent
coverage
of Rizzo and Duong's attack on SSL/TLS implementations.
they've demoed the attack and information is starting to trickle out
(the news articles above were written
prior to release), we can begin evaluate the impact of this work.
(See
AGL's post
on this). Unfortunately, there's no paper publicly available and
the live chat during the talk raised more questions than were
answered.
[in large part due to the inadequacies of trying to
ask questions over WebEx chat -- EKR 9/24]
First, the bottom line: Don't Panic. Yes, this is interesting
work, no SSL/TLS is not completely broken. In particular, your
communications with your bank are quite likely to be fine.
In particular, AGL suggests that Chrome is fine.
Background: CBC Encryption
In order to understand what's going on here, you need some background.
SSL/TLS can encrypt data with two kinds of ciphers: block ciphers like AES and DES
and stream ciphers
like RC4. You don't need to worry about stream ciphers for now. This
attack only applies to block ciphers. The way that a block cipher
works is that it's a keyed mapping from plaintext blocks (typically
128 bits) onto ciphertext blocks of the same size. So, it's like
having a huge table with 2128 entries showing each
plaintext block M and it's corresponding ciphertext block C.
Each key represents a different table. So, we represent encryption
as a functin C = E(Key, M) meaning that we compute the
encryption function on Key and M and the result is
the ciphertext.
The obvious way to use a block cipher is to break up
plaintext into 128-bit blocks and encrypt each block separately
(this is called electronic codebook (ECB) mode. It
should be obvious that if you have two blocks that are the
same in the plaintext they are also the same in the ciphertext
and so you patterns in the plaintext get reflected in the
ciphertext. This is bad. This Wikipedia article has a good visual comparison of
just how bad it can be. In order to prevent this, other
cipher modes have been developed that break up those patterns.
The one used by SSL/TLS (at least prior to TLS 1.2) is called
cipher-block chaining (CBC) mode. The way that
CBC works is that when you encrypt block i you first
XOR in the encryption of block i-1. More formally:
Ci = E(Key, Ci-1 ⊕ Mi)
Obviously, when you go to encrypt the first block, there is
no previous block to XOR; in, so the standard practice is to
generate a random initialization vector (IV) and use
that as if it were the encryption of the previous block.
The effect of all this is to break up patterns: consider
the first block M0. To encrypt it you
compute:
C0 = E(Key, IV ⊕ M0).
And then to encrypt M1 we do:
C1 = E(Key, C0 ⊕ M1).
Now, unless C0 happens to be the
same as IV (which is very unlikely), then
even if M0 = M1
the input to the two encryption functions will be different
and so C0 ≠ C1, thus
breaking up the pattern.
How CBC is used in SSL/TLS
The way I've described CBC above is as if you're just encrypting
a single data blob (e.g., a file) consisting of a number of blocks.
However, SSL/TLS is a channel encryption protocol and so
it wants to encrypt not a single file but a series of records.
For instance, you might use a single SSL/TLS connection for
a series of HTTP requests, each of which is broken up into
one or more records which might be sent over the course of
seconds to minutes. All the records (in each direction) are
encrypted with the same traffic key.
There are two basic ways to use CBC in this kind of environment:
- Treat each record as if it were independent; generate a
new IV and encrypt the record as described above.
- Treat the records as if they were concatenated into a
single large object and just continue the CBC state between
records. This means that the IV for record n is the
last block (the CBC residue) for record n-1.
SSLv3 and TLS 1.0 chose the second of these options. This seems
to have been a mistake, for two reasons. First (and more trivially)
it makes it hard to use TLS over any kind of datagram
transport (hence DTLS) and second, it turns
out that there is a security issue.
The Original Predictable IV Issue
Back in 2004, Moeller [*] observed that it was
possible to exploit this behavior under certain circumstances.
(the original observation of this style of attack seems to be due
to Rogaway [*] and then extended to SSH by Wei Dai.).
Imagine that you're an attacker who can convince one side
of the SSL/TLS implementation to encrypt some data of your
choice. This allows you to learn about other parts of the
plaintext, even if you wouldn't ordinarily be allowed to
see that plaintext.
Consider the case where we have a connection between Alice
and Bob. You observe a record which you know contains
Alice's password in block i, i.e., Mi
is Alice's password. Say you have a guess for Alice's password:
you think it might be P. Now, if you know that the next record
will be encrypted with IV X, and you can inject
a chosen record, you inject:
X ⊕ Ci-1 ⊕ P
When this gets encrypted, X get XORed in, with the
result that the plaintext block fed to the encryption algorithm
is:
Ci-1 ⊕ P
If P == Mi, then the new ciphertext block will be
the same as Ci, which reveals that your guess is
right.
The question then becomes how the attacker would know the next
IV to be used. However, because the IV for record j
is the CBC residue of record j-1 all the attacker
needs to do is observe the traffic on the wire and then
make sure that the data they inject is encrypted as the
next record, using the previous record's CBC residue as
the IV.
While troubling, this isn't that great an attack. First, the
attacker needs to be able to somehow mix traffic they control
with traffic they don't control and can't see, all over the
same SSL/TLS connection. This isn't impossible; for instance
it might happen over an SSL-VPN, but it's also not that common.
Second, it only lets you guess a whole plaintext block at
a time, so even if you're guessing a very low entropy value,
it takes a lot of guesses to search the space.
Still, this is a serious enough issue that the IETF felt like
it was worth fixing, and the TLS Working Group duly developed
TLS 1.1, which changed to the first strategy (called
"explicit IV" in the standard).
[Technical note: the required defense is actually slightly
more complicated because you need to make the TLS-using
application commit to the entire plaintext block prior
to revealing the IV.] TLS 1.1 was developed in
2006, but deployment has been pretty limited ([*]).
We don't know why for sure, but I think the general feeling
in the security community is that the threat didn't seem
serious enough to motivate upgrading.
The Rizzo/Duong Attack
Rizzo and Duong's paper improves on this attack in two ways:
- They have developed a more efficient attack which allows
the attacker to guess a single byte at a time rather than
having to guess a whole block.
- They observe that a specific use of Web technologies
(cross-origin requests and in particular Web Sockets)
allows the attacker to mix traffic in the fashion
described above.
Shifting the Boundary
The improvement in the attack is easy to understand. Imagine
that the attacker has some control about the way that data
is fitted into blocks. So, consider the case where we want
to guess Alice's password, which (without loss of generality)
we know is 8 characters long. If the attacker can arrange
for the password to be split up in between records so that
the first character is in one record with otherwise predictable
contents and the next 7 characters are in the next record,
then the attacker only needs to guess the first character.
For instance, if the way the username/password protocol works is that
you send the string user: alice password: ********
where ******** is the password itself. So,
if the attacker can arrange that this is broken up
as lice password: * | *******.........,
then they can guess the first character of the password in isolation.
Furthermore, if they know the first character, they can then shift
the record boundary by one byte and then guess the next character.
The way this attack plays out in practice is that the attacker
exhaustively searches the first character, then fixes
the first character, and searches the second, and so on.
Exploiting WebSockets
The previous best attacks
involved VPNs, but Rizzo and Duong suggest a different vehicle.
The basic idea is that the Web is an inherently multi-site
environment and it's very common for JavaScript coming
from Site A to talk to Site B (for instance, for mashups).
To give just one example, if you embed an image on your
Web page that comes from www.example.com,
the browser makes a request to www.example.com.
Importantly, this request includes any cookies you
might have for www.example.com. This capability
is the basis for a variety of attacks, including
cross-site request forgery (CSRF),
and cross-origin requests (i.e., those made by scripts
from site A going to site B) are strictly limited in order
to limit those attacks.
These restrictions, however, are inconvenient, and so many
newer Web technologies are moving to a security model
of origin-based consent. The idea here is that when a
cross-origin request is made to site B from site A,
the browser asks site B whether it's OK from that
site, thus allowing site B to selectively allow access
only to safe resources. One such technology is Web Sockets,
which is designed to allow a client/server pair to start
with an HTTP transaction and upgrade it to a transparent
(non-HTTP channel) that allows the transmission of arbitrary
messages that aren't framed as HTTP messages. The way
that WebSockets works is that there is an initial HTTP handshake
(including cookies) that allows the client to verify that
the server is prepared to do WebSockets. The handshake
looks something like this:
Client->Server:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Cookie: 0123456789abcdef
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Server->Client:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
After the handshake, client-side JavaScript is allowed to
send arbitrary data to the server, though it is wrapped in
some framing data.
It should be obvious at this point how one might use WebSockets
as a vehicle for Rizzo and Duong's attack. Say the attacker
wants to recover the cookie for https://www.google.com/.
He stands up a page with any origin he controls (e.g.,
http://www.attacker.com/. This page hosts JS that
initiates
a WebSockets connection to https://www.google.com/.
Because WebSockets
allows cross-origin requests, he can initiate a HTTPS
connection to the target server if the target server allows it
(e.g., because it wants to allow mash-ups).
Because the URL (/chat above) is provided by the
attacker, he can make it arbitrarily long and therefore
put the Cookie wherever he wants it
with respect to the CBC block boundary. Once he has
captured the encrypted block with the cookie, he can then
send arbitrary new packets via WebSockets with his appropriately
constructed plaintext blocks as described above.
There are a few small obstacles to do with the framing, but
Rizzo and Duong claim that these can be overcome and
those claims seem plausible.
The Impact of Masking
That's the idea anyway. Fortunately, I've omitted one detail:
what I've just described is WebSockets draft -76. This version
of WebSockets was shipped in some browsers and then largely disabled
(for instance here)
because of a vunerability published by David Huang, Eric Chen, Adam Barth, Collin Jackson,
and myself. The version of WebSockets which the IETF is
standardizing incorporates a feature called masking
in which the browser generates a random 32-bit mask
that gets XORed with the content of the packet prior to
transmission (and hence prior to SSL/TLS encryption).
The impact of this change is that if an attacker wants to
use WebSockets they only have a 2-32 chance
of being able to generate the right input to the encryption
algorithm to mount the attack. Obviously, this isn't
as good as random IVs (which increase the difficulty
by a factor of 2128 for AES), but it's a pretty significant
barrier nonetheless.
Note that I'm not saying that my co-authors and I knew about this attack
or that we pushed for it as a countermeasure. Rather, we were
concerned about a different class of attacks in which an
attacker was able to control bits on the wire, and masking
was intended to deny the attacker that kind of control.
However, since similar levels of control are required in order
to mount this attack, masking seems to be an effective countermeasure
here as well.
As should be clear based on the above discussion I don't think
that this is an issue with newer versions of WebSockets
(which means recent versions of browsers other than Safari)
and of course older browsers don't implement WebSockets at
all. And even if you have a browser which is vulnerable,
you need to be talking to a target site which actually
accepts cross-origin WebSockets requests, which as far
as I know is very rare for high-value sites such as
financial sites.
Exploiting Java
The demo that Duong and Rizzo showed today used Java to provide the
vector for the attack. As I understood their presentation (and note I
don't have their papera copy of their paper with
full details on how they're using Java but the version that is
floating around says URLConnection [-- updated 9/24]) they say they don't need any heightened Java
privileges. What's a little confusing here is exactly how they
are getting past same-origin issues. In particular, which
Java APIs they are using and whether this was expected/known
Java behavior with respect to SOP or whether they had found
a way around SOP was really unclear. That's important to
know, in part because it dictates the right response and also
because it tells us whether they've found a threat that extends
past HTTPS. In particular, if their is a clear SOP violation
(as for instance in this exploit)
then you have a serious problem regardless of any crypto magic.
Requirements for a Successful Attack
This post is really long, but the last thing I want to cover is
what conditions would be required to mount a succesful attack
using this type of technique. As far as I can tell, we need
to have a target domain which allows cross-origin requests that:
- Contain some user secret (e.g., a cookie) in a predictable
location.
- Allow scripts from the attacker's origin to tightly control
additional data on the same HTTPS connection as the user secret.
It's this mixing of data under control of the attacker and
data which should be kept secret from the attacker that
constitutes the threat. This is a a very natural thing to do in the Web context; mashing up
data from one site with another is something that happens
all the time. The Web security model is designed to protect
you from that, but the lesson here (once again)
is that actually getting that right is somewhat tricky.
I'm actively trying to get more details on how this attack works,
so more as I get them. At the moment, my advice would be to
disable Java—that would be my advice in any case—and
otherwise probably don't get too excited.
Next Up: Countermeasures other than upgrading to TLS 1.1