Notes on the ECRYPT Hash Function Workshop

| Comments (2) | COMSEC
A few impressions from the EFHW, but first some tutorial material.

For those of you who don't know how hash functions are constructed, the basic idea is iteration. The current hash functions all use a construction called Merkle-Damgard (MD). You start with some compression function F which takes a block M of size m and turns it into a block S of size s where m > s. For SHA-1, these are 512 bits and 160 bits respectively. F also takes as input a state variable of length m. This allows you to chain by taking the output value and feeding into F as the next state.

So, the idea here is that we start with state IS and then compute S0 = F(IS,M[0]). That's the first block. We compute the second block as S1 = F(S0, M[1]) and so on. Once we've processed every block, the final S value is our hash output. [Yes, I know this only works on even block-length messages. There's a padding technique to fix this.] There are other ways to connect up compression functions but given that all the compression functions anyone is seriously proposing can only process block sizes of limited length, pretty much all the hash functions need some chaining mode, whether it's M-D or something else.

So, in order to define a new hash function you need to specify:

  • The compression function.
  • The chaining mode.

As I said earlier, all the major hash functions (MD5, SHA-1, SHA-256, ...) use M-D, but they differ in the compression function. Why use M-D? It's simple and you can prove that if the compression function is collision resistant than so is the final hash construction. Of course, as we've seen with MD5 and SHA-1, if the compression function isn't collision resistant, than the game changes.

So, with that in mind, the workshop:

  • Everybody wants to have a hash function that's provably secure. Unfortunately, it's not something we actually know how to do. We actually have a pretty good idea how to prove stuff about the chaining modes but the problem is the compression function. To a first order, the compression functions fall into two categories: bit-shuffling algorithms like all the current hash functions (and like block ciphers, incidentally) and algebraic functions like VSH. We know how to prove the security of the algebraic functions (or at least nobody is willing to propose one they can't prove security for) but the performance is too slow and as far as I can tell nobody has any practical ideas for how to make it much faster. Nobody knows how to prove anything about the bit shuffling algorithms.
  • The security model for hash functions is turning out to be a lot harder to define than people would like. In particular, the properties aren't limited to the classic preimage resistance and collision resistance. People would also like it to act "like" a random oracle. Why? Because the proofs of security of other constructions (E.g., OAEP, PSS) depend on random oracles. Note that current hashes definitely aren't random oracles because given a hash value H and the fact that it corresponds to some message M we can compute the hash of M || X without knowing M (this is called extension and is a basic property of the simple M-D construction). This isn't allowed of a random oracle.
  • To make matters worse, hash functions are used in a whole bunch of ways that we don't even have analyzed security models for. We have no idea what kind of properties those functions demand, so it's pretty hard to know what properties the hash function has to have not to break them. It seems likely that a new hash in the style of the ones we have now would be ok—or at least not obviously bad—but that narrows the design space a lot.
  • Not only do we not have a good theory of how to build a provable hash function, we don't really have a good theory of how to break them. We just have a bunch of techniques which people are still improving. Nobody seemed to feel confident that they could design a bit shuffling hash function that would resist a concerted attack. There's a fair amount of skepticism about SHA-256, too, mostly on the principle that it was designed before these new attacks and that it was designed by the same organization which produced SHA-0 (since badly broken) and SHA-1 (significantly weakened).
  • NIST is planning to have a competition to define what I think John Kelsey called the Advanced Hash Standard (AHS) [which as Hovav Shacham points out would presumably be pronounced "ass"]. Slides not on the Web yet, unfortunately. Summary: They don't really have any idea how many entries they're going to get, how they're going to prune them down, or what the final design criteria are going to be. There seems to be a real chance that they're going to end up with a bunch of functions none of which they know how to break but without any good way to decide between them. SHA-256 will not be part of the competition. This actually seems to me to be not that great an idea, since it's going to be kind of weird if the new functions aren't measurably better than SHA-256 but NIST is telling us to replace it, especially since by the time a decision is reached we'll have just gotten SHA-256 wedged into everything. I joked to John Kelsey that I was going to submit SHA-256 myself, along with all the papers devoted to attacking it as my submission.


Have there been any useful papers devoted to attacking SHA-256?


Although I suppose if it's pronounced the other way, the papers will have funnier titles. ("Kicking AHS with Differential Cryptanalysis")

Leave a comment