Historic recordings

| Comments (1) | COMSEC
I'm trying to puzzle out what this NYT article is about. I get the problem: you've got digital records and you want some way to establish their provenance and contents for posterity. What I don't get is what the claimed solution is:
Designing digital systems that can preserve information for many generations is one of the most vexing engineering challenges. The researchers' solution is to create a publicly available digital fingerprint, known as a cryptographic hash mark, that will make it possible for anyone to determine that the documents are authentic and have not been tampered with. The concept of a digital hash was pioneered at I.B.M. by Hans Peter Luhn in the early 1950s. The University of Washington researchers are the first to try to simplify the application for nontechnical users and to try to offer a complete system that would preserve information across generations.

...

After capturing five gigabytes of video in 49 interviews, the group began to work on a system that would make it possible for viewers to prove for themselves that the videos had not been tampered with or altered even if they did not have access to powerful computing equipment or a high-speed Internet connection.

Despite the fact that there are commercial applications that make it possible to prove the time at which a document was created and verify that it has not been altered, the researchers wanted to develop a system that was freely available and would stand a chance of surviving repeated technology shifts.

At the heart of the system is an algorithm that is used to compute a 128-character number known as a cryptographic hash from the digital information in a particular document. Even the smallest change in the original document will result in a new hash value.

It doesn't help here that the Times's doesn't actually link to the relevant site. They have links all right, but they go to the Times's explanation of the relevant terms, not to the site you want to. So the above is based purely on the Times article. If someone has a pointer to the project site, I'd be interested.

Anyway, using cryptographic hashes for document integrity like this is a pretty standard technique, and there's plenty of free software for it; it's a builtin on most machines. The difficult problem isn't actually establishing integrity though, it's establishing authentication and (more importantly in this case), time of creation. To see why, consider the threat model. Someone gives you a recording that they claim represents a historical event that was observed by someone long-since dead. Classic public key cryptography isn't a complete solution here: how do you validate the original creator's public key and even if you had it, how would you know that they didn't tamper with the data themselves some time years afterwards? What's more useful is to know that that recording existed at (or before) the time it was allegedly made. Hashes can help here, but what you need is to have some independent channel that carries the hash along with some sort of evidence of when it was made. So, for instance you might print a hash of your recording in the newspaper classified section and then anybody who could lay hands on the paper could independently verify the recording. [Technical note: there are some cooler techniques using chains of hash functions but this is the basic principle.]

Note that it doesn't help to have the hash just attached to the document without some other form of cryptographic protection (e.g., a digital signature.) This doesn't buy you any protection against attackers, because they can change the document and the hash to match. The way to think about this is that hashing is a technique for bootstrapping your confidence in a small data value (the hash) into confidence in the entire data object that was hashed. But you still need a secure channel for the object.

With that in mind, I don't really understand how the live CD thing works either. Just like you need the hash to be carried independently, you also need an independent code base to do your own hash computation.

1 Comments

It sounds like the alleged big deal is that a cryptographic hash requires identical binary input, so you can't change the codec used by video without breaking the hash, but codecs for obsolete video formats are often hard to find. So they're publishing the video and the source code for the codec together.

Then there are layers of feel-good BS to make this simple concept sound really important.

Leave a comment