Understanding the Sotirov et al. MD5 attack

| Comments (3) | COMSEC
Ever since the original Wang attacks on MD5 in 2005 it's been clear that certificates were the most attractive target. Today, Sotirov, Stevens, Appelbaum, Lenstra, Molnar, Osvik, and de Weger report (slides writeup) on an attack against a real CA, in this case RapidSSL.

In order to understand what's going on, we first need to recall some basic facts about how certificates work A certificate is a digitally signed assertion of the binding between a name and a public key. The data to be signed is as follows (I'm simplifying a bit)

versionThe version number (2)
serialNumberThe unique certificate serial number
issuerThe name of the CA issuing the certificate
validityThe time period when the certificate is valid
subjectThe name of the entity to which the certificate was issued.
subjectPublicKeyInfoThe entity's public key.
extensionsArbitrary extensions

In order to make a certificate, this data gets serialized using an annoying encoding and then the entire mess is hashed and then the resulting hash is digitally signed by the CA. The problem we have here is that the hash, in this case MD5, is weak. More precisely, it's possible to generate a collision: two inputs that hash to the same output. (See here for more background on attacks on hash functions.) We've known for years how to exploit this kind of attack. The basic idea is that the attacker prepares two documents, one "good" and one "bad" that hash to the same value. He then gets the signer to sign the "good" variant and then cuts and pastes the signature onto the "bad" variant, thus producing a valid signature on the bad document.

So, the way you would use this to attack certificates is that you would generate a "good" certificate signing request that would result in a certificate that had the same hash as a "bad" certificate you had generated locally. You get the CA to sign the request and then substitute the bad certificate. Until now there were two major obstacles to using this technique to attack certificates:

  • It wasn't clear that the serialNumber field was predictable.
  • The techniques for generating collisions weren't very good: they weren't that controllable (they generated a lot of random-appearing data) and were slow; or rather there were techniques for generating fast collisions but they weren't at all controllable.

The relevance of the serialNumber is this: unlike the name and the public key, the serialNumber and validity are generated by the CA. So, you need to know in advance what they will be in order to generate the appropriate colliding "bad" certificate. The validity is typically just generated as something like a year or two from the time of issue, so it's relatively predictable. The CA has a lot of freedom in how to generate the serial number. If it's truly a sequence number, it's quite predictable. However, if it's randomly generated, then it can be made arbitrarily unpredictable, which effectively blocks this kind of collision attack. When MD5 collisions were first discovered, the two standard recommendations were (1) stop using MD5 and (2) generate random serial numbers.

This Attack
Which brings us to this new work, which involves two main contributions. First, the authors improved their collision finding techniques so they need a lot less random-appearing data. The second is that they found a CA which still used MD5 and doesn't randomize the sequence number. Taken together, this allowed them to convince the CA to sign a certificate which was in itself valid but which collided with a certificate that the CA would never have signed, in this case a certificate for a new, subordinate CA. (It could just as well have been a certificate for a specific target web site, but that's less flashy than a CA certificate.) Once in possession of this new CA certificate, it's possible for the authors to sign arbitrary new certificates which will be trusted by anyone who trusted the original CA [subject to some technical limitations which I won't go into here.] Effectively, the authors have made themselves a CA.

There are some interesting technical hacks needed to make this work: although the serial number is somewhat predictable, it's not completely so, and in order to mount the attack they had to guess the serial number in advance. This guess isn't totally accurate, but they were then able to issue their own CSRs to increment the serial number to where they needed it to be.

The impact of this is that the authors could in principle mount man-in-the-middle or other impersonation attacks on any Web server provided that the client trusted this particular CA (most do). The existence of this certificate doesn't allow anybody else to mount impersonation attacks, since ordinary attackers won't have the corresponding private key (unless they break into the authors machine and recover it, of course). The authors have taken some steps to make the particular certificate they issued less useful for this purpose. In particular, it has a time way in the past, so unless your clock is way off, you should notice this attack. That's not to say that there's no risk here, since you might not notice the expiration date issue.

Of course, it's possible that an attacker could independently use the same technique to acquire their own CA certificate. In fact, we don't know for a fact that nobody already has. The only real obstacle is that the crypto needed here is fairly involved and the experts on it are mostly respected academics, many of whom are on this paper. So, the sooner that CAs adopt the mitigations mentioned above, the better.

I should mention that this isn't the only way to get a bogus certificate: many CAs don't do a particularly good job of user verification in any case (I'll be posting about one particular exceptional such case shortly). In particular, it's common to use "email confirmation" for identity verification, where the CA sends email to the administrator of the relevant machine to verify the certificate request. There are probably a number of cases in which it's easier to attack that than to build up a whole certificate collision infrastructure.

There are really two questions about how to contain this vulnerability:

  • What should we do about this specific certificate?
  • What should be done about the class of vulnerability?

The two basic options for this certificate are to ignore it (assume we trust the researchers, especially since the certificate is expired) or to blacklist it. The way that the blacklist would work is that the browser manufacturers would just issue a security update with a patch to the certificate validation code telling it not to trust this specific certificate, just as they would patch any other security vulnerability. For perspective, we can think of this as a vulnerability with an exploit that is known only to the researchers—even though we have the CA cert, we can't use it productively, and it's not likely to be reproducible. If I were in charge of a browser, which I'm not, I would probably issue a patch with a blacklist for this certificate. Others opinions may vary; as far as I know, the browser manufacturers didn't issue mandatory security updates blacklisting all the Debian OpenSSL keys, so that may be a cue to their general attitude.

The second question is what to do about this class of vulnerability. Because this attack only can be mounted against a live CA, not against an old certificate, it's very important that the affected CAs either stop using MD5, use randomized serial numbers, or both. Presumably, the news coverage will act as an inducement for them to do so. I've also heard suggestions that the browser manufacturers should disable MD5. There are probably still enough MD5-using servers out there that this would be problematic, though it's something to consider for the future.

Bottom Line
As usual, don't panic. In its current state, this is more of a demonstration of a hole than a serious hole. Countermeasures are readily available to the CAs and if the remaining CAs fix their practices fast enough, then it's unlikely that there will be any more bad certificates issued (it takes some time to spin up your infrastructure for this attack). Even if one or two such certificates are issued—even to bad guys— it's not the end of the world. Once they're detected they can be blacklisted. This takes a long time with the current patching rate, but it's not conceptually any worse than a remotely exploitable problem with your browser, or a bug in certificate validation logic, both of which have been known to happen. That said, it is very important that the CAs do fix their practices, since this has the potential to become serious if the capability to mount the attack becomes widespread and convenient.

UPDATE: Some minor corrections due to Hovav Shacham (only controllable MD5 collisions were slow)


The attack vector against RapidSSL has been eliminated by VeriSign as of about 11:30 this morning. More information here: https://blogs.verisign.com/ssl-blog/2008/12/on_md5_vulnerabilities_and_mit.php

Am I correct in assuming that simply configuring your browser not to use MD5 SSL ciphersuites will _not_ solve this problem? IIRC, the digest algorithm in the ciphersuite only applies to the MAC, not the certificate. Am I remembering this right?

What about using OCSP?

Leave a comment