Tell me the one where you're going to block child porn again

|
News.com reports that AOL, EarthLink, Microsoft, United Online and Yahoo are going to cooperate to block child porn.
The system works like this: When AOL employees become aware of a child pornography image included as an e-mail attachment, they forward the attachment and information about the sender's geographic location to the National Center for Missing & Exploited Children, which in turn sends it to the appropriate law enforcement agency. AOL also generates a digital fingerprint of the image so it can be automatically flagged if it flows through the company's network in the future.

...

Another possibility, Schoen said, is that child pornographers who know how the system works would simply make a tiny tweak to photographs to avoid detection--rendering the hash detection system useless. Internet providers could counter-attack using a "locality sensitive hash" function that's designed to detect similar files, but even that in turn could be foiled if image files are encrypted.

Schoen is quite right, of course. Cryptographic hashes are particularly poorly suited to this problem because the whole point is for them to be sensitive to even single-bit errors. There certainly are fuzzier hashes but pretty much any hash function you use is going to be easy to defeat by even simple non-cryptographic transformations in the data, let alone by encryption.

There are two challenges to making an encryption scheme work here. The first is that because ciphertext is so high entropy it's generally fairly easy to detect. The good news is that compressed images in general are very high entropy anyway, so ciphertext is fairly easy to hide there using standard steganographic techniques work. There are techniques for detecting steganography but they're not superfast and the job can be made arbitrarily difficult if you're willing to accept a suitably low coding rate.

The second challenge, is to design a scheme that doesn't require the communicating parties to exchange cryptographic keys. You could of course just carry the key in the message. This would still produce a basically random ciphertext which would defeat simple hash matching--until the ISPs reverse engineer your data format and figure out how to extract the encryption keys that is. However, there's a simple workaround use a short encryption key (say 24 bits or so) but don't put it in the message. The recipient just exhaustively searches the key space, which doesn't take long. Obviously, the ISPs could do the same thing but they're at a very severe disadvantage because they have to scan every candidate message--which is basically impractical--whereas you just need to scan the ones sent to you.