Nearly all symmetric encryption algorithms use unstructured keys,
i.e., bit strings of a certain length.
If we
model an encryption algorithm as a function
E
C = E(K, M)
Where
C is ciphertext,
M is the plaintext, and
K is the key, then the way you generate
K
is simply to choose a random
bit string of the appropriate length. The interesting thing
is that some encryption algorithms have what are called
"weak keys" which have special (bad) properties.
For example, DES has four weak keys, which have the property that encryption
and decryption are symmetrical. Thus, if have a weak key K_w
then
D(K_w, M) = E(K_w, M)
DES also has 6 pairs of semi-weak keys: pairs of keys K_w1 and K_w2
such that:
D(K_w1, M) = E(K_w2, M)
Back in the old days, it was widely believed that you should make
an attempt not to use weak keys (i.e., you would check generated
keys and reject them if they were weak) but it seems to me that
recently opinion has turned towards ignoring the issue. Why?
First, it's not clear how bad choosing a
weak key is for a real algorithm.
In the case of DES, for example, the weak keys still encrypt the
data, they just do it in such a way that if you ever encrypt
it again with the same it decrypts it instead. This is a problem
in theory, but in practice it's hard to imagine a common
scenario in which this would happen, especially when you're
using the algorithm in CBC mode. Remember that encrypting
the same data twice in CTR
mode always decrypts it,
so if you want to build a secure system that's algorithm-agile
you have to be careful about this anyway.
Second, the chance of randomly1 choosing a weak key in DES is very low.
Even if you count semi-weak keys as weak, the chance of choosing
one is about 1/252. What do you figure the chance that
something will go wrong with your hardware or software and you'll fail to
encrypt the data entirely is?
But let's ignore these issues and consider a case where weak
keys were really weak, like they turn the encryption algorithm
into the identity function. Now consider things from the perspective
of an attacker. They're given some "ciphertext" X and want to
try to decode it. To mount a brute force known plaintext attack,
we do:
- For each potential key K_i
- Compute M'=D(K_i,X)
- If M' looks like a plausible message then exit.
- else try the next key.
If the cost to do a decryption is d, the cost
to see if a message is plausible is p, and there
are k key, then the average running time of this
algorithm is kdp/2.
Now, if there are w weak keys, then the attacker does a
somewhat different algorithm:
- Check to see if the message is plausible as-is.
- For each potential key K_i
- Compute M'=D(K_i,X)
- If M' looks like a plausible message then exit.
- else try the next key.
The running time of this algorithm is
p+(k-w)pd/2, so
the advantage to the attacker is (approximately)
wpd/2,
so it starts to look like you should avoid weak keys. But
consider that if you do that--and we assume that the attacker
knows how your implementation behaves--then he can run
the original algorithm just skipping the weak keys, at cost:
(k-w)pd/2, or slightly less than the cost if the
message sender just ignores the weak keys.
The problem here is that it's the
existence of weak
keys, not their use by message senders, that reduces the
key space of the algorithm.
Now, in the particular case where the weak keys turn encryption
into a no-op, you do have to worry about automatic traffic
sniffers designed to data mine ordinary unencrypted traffic.
Such packet sniffers would of course automatically pick up
any "encrypted" data processed with weak keys. However,
as a matter of practice, this doesn't happen with real algorithms.
First, as previously noted DES weak keys aren't the identity function.
Second, the identity property would only apply in ECB, not with CBC
or CTR. So, some processing would be required (even if
it's much faster than trying a new key) prior
to checking whether the plaintext were plausible, which brings us
back to the analysis before, with a new term for the "fast" processing.
Once again discarding weak keys makes the attacker's job very
slightly easier.
I should emphasize that this issue is mostly theoretical. No popular
algorithm has a significant enough number of really weak keys
to give you a significant chance of generating one by accident.
But that's all the more reason not to incur the implementation
complexity of bothering to reject them.2
1. Though note that all zeroes and all ones
are both weak keys. One might imagine that if your PRNG failed you
would get one of these. So, as a special case you might want
to check for these to make sure your system is still functioning.
On the other hand, there's always the chance that the check
itself will screw something up...
2. The analysis here is about encryption.
As this Wikipedia article points out, the situation is different in MAC modes, but standard
practice these days is to use hash-based MACs rather than
encryption algorithm based MACs--though of course those might
have weak keys too.