# Guaranteeing deletion

I had occasion to wipe my disk the other day prior to bringing the machine into Apple (this never materialized, but that's another story), and Allan Schiffman asked why I trusted Disk Utility if I didn't trust Apple's techs. Now, I do have a story for this, but it suggests an interesting question: say you want to wipe the disk on a computer you don't trust; how can you be sure the wipe was successful. The key problem is that the computer can lie, so just telling it to write zeros (or anything else) to the disk doesn't help. It can just pretend to do the write and then "forget". So you need some way to verify the data was actually written.

The natural thing to do is to read it back. So, for instance, we could write block i and then read it back immediately. Unfortunately, once we've done that there's nothing stopping a malicious machine from writing block i+1 over block i, and keeping the real block i+1 around. In the limit, then, the attacker just needs to erase one disk block, the one it uses as temporary storage. In order to demonstrate complete erasure, then, you need to force the target machine to fill the entire disk with data of your choice, thus leaving no room for the original data (cf. pigeonhole principle).

So, here's a first cut at an approach:

1. Mount the disk on the target machine remotely on a trusted system.
2. Write B blocks of data to the target system where B is the capacity of the drive.
3. Read back all B bytes. If they match, declare success. Otherwise, fail.

Now, as an optimization we can do the job with very little state on the trusted system, as long as we use a little crypto. The basic idea is to write predictable pseudorandom data to the target machine. It needs to be predictable to us but not the target machine, and it needs to be pseudorandom to avoid the target machine compressing it1 and using some of the remaining space to save the original data. The natural approach here is just to choose a random cryptographic key K and for block i write F(K, i) where F is a function that produces a pseudorandom block from a key and an index (i.e., counter mode). The only state that this requires on the trusted system is the key itself, which is tiny.

Similarly, we can avoid reading all the data by forcing the target machine to compute a function that requires all the data. For instance, we give it a random key K_2 and force it to return MAC(K_2, data). As long as K_2 is not known during the storage phase, the target needs to know all the data to compute this function. [I'm oversimplifying a little here, but this is true for realistic MAC functions.] This technique requires a slightly richer interface on the target machine, since MAC computation isn't a standard feature of remote disks, but we can get this functionality by installing a small agent, and that agent need not be trusted.

Unfortunately, as Allan pointed out to me, this assumes that you know exactly how much storage is available to the target machine. In practice, however, you don't. First, the machine has some memory, and it can use that as buffer storage. You might be able to get around this with a reboot from CDROM, however. Second, a really malicious manufacture might lie to you about the disk capacity. Indeed, real disks don't have exactly their rated capacity because some space is used for bad sector concealment. In general, if the machine has storage capacity S and you think it has capacity S' then it can always retain S - S' worth of data that can't be determined via the mechanism described above. If you're really serious about removing all the data on your drive and for some reason you absolutely don't trust the drive or the computer, physical destruction is the way to go.

1. Readers may notice a connection with proofs of retrievability (see, for instance [SW08]), but this problem differs in that we want to impose a stricter condition on the target, not just that it be able to retrieve B blocks but that it actually consume that much storage.

Acknowledgement: Thanks to Hovav Shacham for talking this over with me.
UPDATE: Minor editorial.

If you are assuming trusted boot (which you mentioned), the only part you now need to trust is the BIOS and the disk itself.

And if you can't trust the BIOS or the disk itself, you could never trust the computre anyway, it could have happily exfiltrated all the data quite happily.

I'm not assuming trusted boot. The reason to boot from CDROM is that you want to wipe out all the data on the disk and this includes the operating system, so you can't reboot without a cdrom or netboot or something like that.