COMSEC: April 2008 Archives

 

April 20, 2008

Voltage (full disclosure: I have a number of friends there and I'm on their TAB) have released a technology they call Format-Preserving Encryption (FPE). The basic technology here is fairly old and is described in a paper by Black and Rogaway, but as far as I know, this is the first attempt to try to put it together in a single commercial package. Below I attempt to describe some of the relevant technical issues, which are sort of interesting.

Why FPE?
The use case for FPE is simple: say you have a database that contains information with multiple levels of sensitivity. So, for instance, if you're Amazon you might have a customer database that any employee can access but you'd like the credit card numbers to be accessible only to employees that really need it.

The classic approach here would be to use database access controls. This works well as long as you trust the DB server, but if, for instance, you want to send a copy of the DB to someone else, then you may not be able to trust their server, so you need to redact the database, which can be a pain. Another problem here is that sometimes sensitive information like CCNs is used for customer identification, which means you can't just redact the CCN. Rather, you need to replace it with something that's unique but doesn't leak the CCN itself. And of course, if someone compromises your database server, then all bets are off.

The problem with simple encryption
The natural alternative is to use encryption. Encrypting the whole database doesn't help, because you want users to have access to most of the database, just not to the sensitive fields. So, what you need to do is encrypt just the sensitive fields. This turns out to be trickier than it looks.

For example let's say we want to encrypt the social security number 123 45 6789 using AES-ECB. So, we might do:

  • Encode into ASCII to give 31 32 33 34 35 36 37 38 39.
  • Pad with 00 to give 31 32 33 34 35 36 37 38 39 00 00 00 00 00 00 00.
  • Encrypt with AES to give 77 6e 2c a5 02 17 7a 5b 19 e4 28 65 26 f3 7e 14

This kind of sucks. Not only have we managed to start with a 9 digit string and end up with a 128-bit random-appearing value, none of the bytes of the output are ASCII digits. So, if our database or database software is expecting to have values for this field that look like SSNs, we've just broken that invariant.

The source of the problem, of course, is that we're using a block cipher in ECB mode, and most block ciphers come in a small number of sizes (64, 128, and 256 bits are the standard ones). A block cipher just randomly maps the input space onto the output space, so ECB mode encryption effectively selects a random b-bit value (where b is the block size). The smaller the fraction of the possible values that are valid, the higher the probability that the output will be invalid. To take the specific case of SSNs, there are approximately 2^{30} valid values (if we think of the trailing zeros as not counting), so the chance of producing a valid value by random chance is vanishingly small (order 2^{-98}).

One thing you might think would make sense would be to use a different mode than ECB, say counter. The problem with counter mode in this case is that you need to use a different section of keystream (or a different key) to encrypt each value to avoid easy cryptanalytic attacks. So, you need some per-value distinguisher that gets carried along with the ciphertext, which expands the amount of storage you need for the encrypted values, even as it keeps the ciphertext small.

Luby-Rackoff
As noted above, our big problem is our block size is too large. As noted above, even though SSNs are 9 digits long, they are sparsely packed (for instance letters aren't allowed), so there are approximately 2^{30} valid SSNs, as long as we use a better mapping than straight 1-1 digit correspondence. For instance, think of the 9 digit SSN as a value from 1 to 999,999,999 (not all 9-digit numbers are valid SSNs, but for simplicity, let's pretend they are.) We can represent that in binary as a 30 bit quantity. If we had a 32 bit block cipher, we could encrypt this value with less than 10% expansion, which might be OK under some circumstances (we'll describe how to do better below).

Ordinary block ciphers have blocks much larger than this, of course, but it turns out that there's a generic technique for making block ciphers of arbitrary size (actually, even values only), called Luby-Rackoff (L-R) . The nice thing about L-R is that it's a general construction based on a pseudorandom function (PRF), which we know how to build with standard cryptographic techniques.

Cycle Walking
We can use L-R to build a block cipher with a block size of any number of bits we want, but this still means that our function produces 2^b possible values where b is the block size, but this generally won't line up perfectly with the set of values we want to encipher. To return to our SSN example, we have 10^9 possible values, which means we need a block size of 30 bits, which implies a set size of 2^{30} = 1073741824. So, for any given input value, there's about a 7% chance that it will encrypt to an invalid SSN (greater than 10^9). If the database (or software) is really aggressive about validity checking, then you'll have an unacceptable rejection rate.

To deal with this issue, Black and Rogaway describe a technique they call "cycle-walking". The idea is that we start with an initially valid value (1-999,999,999) and then encrypt it. If the ciphertext is also valid, we stop and emit it. If it's invalid (greater than 999,999,999), we encrypt again, and repeat until we have a valid value. This gives us an encryption procedure that is guaranteed to produce an in-range output. Decryption is done in the same way.

Bottom Line
So, why can't we just use cycle-walking? Because it only works well if the block size is approximately right—if the size of the valid set is a lot smaller than the block size of the cipher, then you have to do a lot of iterations in order to get an in-range result. So, you can't use a 64-bit block cipher in order to encrypt an SSN because you end up having to do a prohibitive number if iterations; you need to use L-R to construct a block cipher of approximately the right size and then use cycle-walking to shave off the last few values.

UPDATE: Paul Hoffman pointed out to me privately that it's not clear how this all relates to FPE. Basically, FPE means the combination of L-R plus cycle walking. This lets you do one-to-one and onto encryption for most set sizes. If the set size is really small, there's another technique (also due to Black and Rogaway): you encrypt all possible input values and then sort the ciphertexts. You then use the index of the ciphertext in the sorted list as the encrypted value. This is obviously prohibitively expensive unless the number of possible values is small because it requires encrypting all possible values and then keeping a very large mapping table.

 

April 12, 2008

Schneier expresses concern about the tire pressure monitoring system (TPMS). The way TPMS works is that each wheel contains a pressure sensor and a radio transmitter which transmits pressure data to a receiver in the car, which somehow alerts you if the pressure is too low. The alleged problem is that in order to allow distinguishing wheels from each other (and from those in adjoining cars), each wheel has a unique identifier, raising the possibility that one could build a radio receiver which would listen for these transmissions and track your car.

Obviously, this isn't that attractive a feature, as Hexview observes:

What problems exactly does the TPMS introduce? If you live in the United States, chances are, you have heard about the "traffic-improving" ideas where transportation authorities looked for the possibility to track all vehicles in nearly real time in order to issue speeding tickets or impose mileage-adjusted taxes. Those ideas caused a flood of privacy debates, but fortunately, it turned out that it was not technically of financially feasible to implement such a system within the next 5-10 years, so the hype quickly died out.

Guess what? With minor limitations, TPMS can be used for the very purpose of tracking your vehicle in real time with no substantial investments! TPMS can also be used to measure the speed of your vehicle. Similarly to highway/freeway speed sensors that measure traffic speed, TPMS readers can be installed in pairs to measure how quick your vehicle goes over a predefined distance. Technically, it is even plausible to use existing speed sensors to read TPMS data!

...

As every other tracking technology, the TPMS was introduced as a safety feature "for your protection". One might wonder why NTHSA (a government agency) would care so much about a small number of accidents related to under-pressurized tires. And why would it choose to mandate TPMS and not run-flat technology? Are we being tracked already? I hope not.

It's absolutely true that NHTSA required TPMS. It doesn't look to me, however, like NHTSA required this particular implementation, or any particular implementation. They just required that the car be able to detect that the car be able to detect loss of pressure by more than 25%. As Hexview observes, there is a simple implementation that dramatically reduces the privacy problem: encrypt the sensor readings, and as far as I can tell this would be quite compatible with the NHTSA requirements (this doesn't totally reduce the problem because of radio fingerprinting, but this is harder than just reading the ID out of the air). The good news is that since there's no need for my car to be able to read your car's tire pressure, it's quite possible for manufacturers to do the right thing without any kind of new standard.

Hexview implies that NHTSA may have required TPMS in order to enable them to monitor your whereabouts, but I find that somewhat unlikely. Certainly, when I was involved with DSRC/WAVE, privacy was foremost on everyone's minds, so it would be strange of NHTSA were to deliberately attempt to violate driver privacy. That said, the manufacturers were also pretty concerned about privacy, so if they have rolled out a system that enables tracking, that's a little surprising.