EKR: May 2010 Archives


May 30, 2010

LaTeX is, of course, the standard document production system for computer science documents (with a tiny minority using {t,n}roff). It's also a good example of one of the standard CS approach of solving problems by inventing a new programming language. Consider that designing a modern Web page involves using three separate languages, HTML, CSS, and JavaScript (of these only JavaScript is obviously Turing complete). As another example, when you print documents, you generate PDF or PostScript, which are just programming languages (PostScript is Turing complete, not sure about PDF)... Anyway, LaTeX is a bit too complete, it turns out.

Steve Checkoway, Hovav Shacham, and I have a paper at LEET describing how a malicious LaTeX file can compromise your computer:

We show that malicious TEX, BIBTEX, and METAPOST files can lead to arbitrary code execution, viral infection, denial of service, and data exfiltration, through the file I/O capabilities exposed by TEX's Turing-complete macro language. This calls into doubt the conventional wisdom view that text-only data formats that do not access the network are likely safe. We build a TEX virus that spreads between documents on the MiKTEX distribution on Windows XP; we demonstrate data exfiltration attacks on web-based LATEX previewer services.

This isn't just an issue of LaTeX files. While people do sometimes run LaTeX files prepared by others, generally those are only files you get from people you know, i.e., your collaborators. But it turns out you can also embed malicious code in BibTeX files, which people routinely copy and paste from totally untrusted sources (the BibTeX entry for this paper is here) in order to simplify reference management. The other major case is LaTeX class files, which people download for conference submission.

The good news is that the main threat is on Windows because LaTeX on UNIX is more restrictive about where you can write files. The bad news is that it's also an issue if you run Emacs (look, another embedded language!) with AucTeX (the best way to edit LaTeX files), AucTeX writes executable cache files in the local directory, so you're at risk.

Happy editing!


May 29, 2010

Ever since I got my VFFs, people have been asking me whether I ran in them and I'd always give the same answer: I haven't been brave enough. After I ran into Phil Stark, though, who does ultras in his huaraches, I figured I'd give it a try.

Rather than slowly transition, I decided to just switch over to Vibrams completely (this was Phil's advice and I had injured my shoulder, so couldn't do too much mileage anyway). That was about 6 weeks ago and I'm now at the point where I can comfortably go up to 7-8 miles, either on trails or road, and I feel like I have a long enough baseline to report back.

The Transition
It wasn't that hard for me to transition. I started with really short, with a mile or so, and then worked my way up over the course of a month or so. If you're a foot striker you need to completely alter your stride so you land either mid or forefoot (this is pretty much the point of going barefoot). I started out mostly on asphalt, which you would think would be pretty hard without any cushioning, but it really forces you to concentrate on your stride: one or two (incredibly unpleasant) heel landings on asphalt with no cushioning teaches you real fast to adjust your stride. Anyway, once your stride adjusts and you learn to land softly, I at least didn't find that there was much trauma to my foot. At around week 3 or 4, I started to get some pain in the metatarsals of my right foot, but that mostly went away after a few more weeks.

Instead of the foot, the primary adjustment was in the calf. Because you land on the forefoot, and seem to push off more by extending your foot, it seems like you put a lot more stress on your gastrocnemius. For the first month or so my calf and achilles tendon would be sore after each run, and at least once I had my right calf completely lock up and I was limping for a few days. This has mostly gone away by now, however, and I feel pretty comfortable up to reasonable distances.

Surface and Terrain
I've now run in VFFs on a whole bunch of different surfaces. Dirt trails are the best, then grass, then asphalt, and then gravel. Basically this is an issue of cushioning: with VFFs you're much more sensitive to how hard the surface is and grass and dirt are just nicely comfortable and springy. (Note: I prefer dirt even with a real shoe). Asphalt gives you a harder landing and so is less comfortable, but basically fine as long as you are actually landing OK. The problem with gravel is that as the size of the rocks starts to get bigger you start to have to really watch your landing: coming down hard on a sharp rock the size of a golf ball can be quite painful.

Climbing hills is good: you would naturally tend to land on the ball of your foot anyway, so it doesn't require much of an adjustment in your stride. By contrast, going down is bad because you would naturally tend to heel strike so you need to really overcompensate to avoid that. And of course since you tend to strike relatively hard going downhill anyway, this is doubly bad. Even now I tend to come down harder than I would like.

Road Hazards
The biggest problem with running in VFFs as opposed to shoes isn't the routine pounding but rather pebbles, rocks, acorns, etc. The soles are just too thin and flexible to protect you from this kind of impact. You can't always avoid stepping on rocks, but when you're running on a basically flat surface you can mostly see them in advance and when you do accidentally step on one, you usually notice before you've put your full weight on it and can just pull your foot back before you've done any real damage. I've only really managed to hurt myself twice: a week ago when I stepped on a small pinecone but landed on the side of my foot rather than the ball and wasn't able to correct. Then yesterday I want running on the baylands trail and there were just so many rocks that I couldn't avoid all of them and so landed pretty hard on a few.

Even in those two cases, I didn't do any permanent damage, just hurt a lot immediately and then ached for the next 5-10 minutes. It feels fine now, though and I don't see any bruising.

Other Issues
People often ask me about running with shoes with so little support: I have incredibly flat feet and I've never really found that having a lot of support did much for me; I find it more comfortable to just let my feet pronate completely the way they want to, even in normal running shoes. I don't know what VFFs would be like for someone with normal arches.

While I wear socks with regular shoes, I don't wear them with VFFs (you can wear Injinjis), and this hasn't been a problem for me. I have one friend who tends to get a lot of blisters with VFFs, but this hasn't been a problem for me at all (and I have gotten blisters with other shoes, so it's not like my feet are especially tough). I suspect this is primarily an issue of fit. For longer runs, it seems like I might be getting a few hotspots and I've been trying to slather on some Hydropel as a precaution.

You need to be a bit careful about stubbing your toe. There's not much protection and if you scrape the top of your toe, you can tear through the thin nylon at the top or peel the rubber sole away. I've got a small tear above my big toe. So far it's not expanding but I've ordered a new pair just in case.

Bottom Line
I suspect I'd be able to run much longer in VFFs (and I'll try a 10 this weekend), but given how much trouble I had when I ran on grave of the wrong size, I'm not sure I would want to do something like an ultra, where I couldn't turn around and didn't know that the surface would be good. In view of that, I'll probably start mixing it up more to make sure I still can run in shoes if I want to.


May 9, 2010

Henry Farrell over at Crooked Timber reports on having his laptop lost and then recovered. He then goes on to recommend a variety of precautions for future incidents:
Also - in the spirit of locking the barn door after the horse has gone but to your very great surprise been returned later through the benevolence of strangers - recommendations for minimizing the pain of stolen machines.

(1) Back Up Everything Important somewhere external. This is the one measure I did take - and the pain would have been far, far greater had I lost my work along with the machine. I use Sugarsync which keeps the work documents on my various machines in sync with each other as well as giving me an online back up - others swear by DropBox, SpiderOak and other services.

(2) Make sure that your account is password protected. I didn't do this - remarkably stupidly - but appear to have gotten away without loss of personal information. You shouldn't take this risk. I won't again.

(3) Set up a firmware password if you have a recently made Mac. Makes it much harder to wipe the OS.

(4) Consider buying anti-theftware like Undercover. Depending on your tolerance for risk, this may be too expensive for the benefits provided (me: my risk tolerance has decreased substantially since this happened to me).

(1) is of course good advice. Backups are good practice for a variety of threat models, including just plain hardware failure. I personally run backups and also keep most of my important stuff in a revision control (originally CVS but I'm moving over gradually to SVN).

Recommendation (2) is nowhere near strong enough. Passwords (barely) protect you against someone who has ephemeral physical access, but if you don't encrypt the hard drive, then a dedicated attacker can either boot up in repair mode (the BIOS password (#3) makes this more difficult) and read your data off or just pull the hard drive out. What you need here is disk encryption. Luckily, the Mac comes with FileValult: a quite serviceable (if a hair slow) disk encryption system.

Recommendation (4) makes some sense, though I doubt I would bother myself. I've never lost a laptop and when we multiply out the chance of loss times the chance of recovery and factor in the likelihood that your laptop will be covered by homeowner's insurance, I'm not sure that the $50 for Undercover is a good bet.


May 8, 2010

As a kid I discovered wool clothes were super-scratchy (and was actually diagnosed with allergy to wool, but more recent tests don't seem to bear this out). So, when Eu-Jin Goh told me that I should try some of the new wool athletic gear (Smartwool, Ibex, etc.) I was pretty skeptical. But then I read this Backpacking Light article comparing wool and synthetics which reports that wool has comparable performance to the best synthetics and started thinking maybe I should give wool another try. A month or so ago, I ordered a Smartwool NTS top (microweight) and it really is pretty good. I've since ordered two more of the NTS tops in lightweight.

Traditionally, people have three main complaints about wool: comfort (it's scratchy), care (you can't machine wash it), and price. Modern athletic wool garments address two of these issues. The main advancement in the comfort front is the use of Merino wool. Merino has a much finer fiber than ordinary wool, and (in general) the finer the fiber the softer the fabric you make out of it. In addition, the new Merino yarns are much thinner, which also makes for a much softer fabric. My Smartwool tops still aren't quite as soft and smooth as a comparable synthetic such as Capilene or typical Coolmax/Lycra blends, but they're far less scratchy than I remember wool being. This doesn't mean not scratchy at all (especially before the initial wash), but it's comfortable enough for everyday wear, and the compensation is that there's rather less of the feeling that you're wearing a Tyvek envelope that you tend to get with synthetic fabrics. [The above relies heavily on the Backpacking Light article, which is behind a paywall, albeit one worth shelling out for if you are interested in this sort of thing.]

The other major advance is washability. As everyone knows, machine washing wool garments ruins them. The problem here is that the fabric felts (actually, Wikipedia claims that they've been fulled, but Mrs. G says that everyone just says "felted"). Basically, what happens is that wool (like all hair) has a scaly external structure and the heat and the agitation causes the scales to interlock, so you just get a single shrunken mass. In the 1970s (thanks, Wikipedia!), however, superwash wool was introduced: superwash has been treated either to remove the scales or by coating them with a polymer that prevents the interlocking. In either case superwash wool is wool you can wash in your washing machine. You can also tumble dry it if you're moderately careful; Smartwool recommends the low setting but I've used my dryer's permanent press setting with no sign of real shrinkage after the first washing (warning: there is some built-in shrinkge the first time so pay attention when you buy). This is mostly as convenient as most of my synthetics, and better than my polypropylene SportHill gear, which I've actually damaged in the dryer.

The price issue, however, remains. For comparison, the GoLite BL-1 lists for $42. The comparable Smartwool NTS is $60. I've also seen complaints that wool doesn't hold up to extended wear as well as the synthetics do, which makes the price issue more serious. Regardless, I'm now considering wool a serious option; I've tried it for a few short runs, but I'll report back after my next serious test, either a long run or a extended backpacking trip.

Acknowledgement: This post relies heavily on discussions with resident wool expert Mrs. Guesswork.


May 7, 2010

As I previously mentioned, my friend Terence bought this, uh, artwork, which sells itself on eBay. Well, Terence has hit the big time with an article in the NYT magazine.
Spies, who is the chief technology officer at Voltage Security in Palo Alto, Calif., describes himself as a collector of "baffling contemporary art." (He mentions the almost monochrome panels of Anne Appleby and Molly Springfield's meticulous drawings of photocopies.) He says another collector once advised him to buy art that "people have a reaction to - good or bad." And "A Tool to Deceive and Slaughter" has elicited reactions ranging from "You're really crazy" to "You're slightly crazy." He's O.K. with that. It "sets people off," he continues, "because it's not even clear what you own."


The new opening minimum bid is calculated to cover shipping and other overhead, so the seller won't lose money, but this setup also limits how much the seller can make should the piece appreciate in value over time. And of course it's possible Spies can own the piece indefinitely - if it fails to become more valuable. "It was totally not an investment," he says, cheerfully. That's good, because as of this writing, "A Tool to Deceive and Slaughter," priced at $6,858, has attracted no bidders.

For reference, a full-page ad in the Times Magazine runs $90k.


May 2, 2010

Yesterday I criticized the proposed requirement of a biometric identifier for work authorization. But just because it's a bad idea doesn't mean it's not interesting to design one. Let's start with the observation I made in the previous post: smart cards are unnecessary here; what you want is a cryptographically protected object that contains the relevant data. E.g.,:

  • Name
  • Identification number
  • Biometric (photo, fingerprint, etc.)
  • Meta-data, e.g., whether they're allowed to work or not, expiration date, etc.n

Note that the actual information we're carrying here is mostly irrelevant from the perspective of security design. The cryptograhy protects whatever opaque data we happen to want to cram onto the card.

Physically, we have a huge amount of latitude in how we design the card; because all the data is cryptography, we don't need any physical tamperproofing features, just a format that will carry enough data (say 1-2k or so, depending on the size of the biometric)1. The data could be encoded on a mag stripe, smartcard (memory type, not necessarily active), or even a 2-d bar code such as a QR code (though we're probably not too far from the limit here).

Digital Signatures
The natural approach is simply to use digital signatures. The federal government would generate a single central signing key and use it to sign everyone's cards. The public key would just be published somewhere (e.g., in the Federal Register) and so anyone could verify people's identification. (You could also use a multitier model where the central signing key identifies subsidiary keys, etc.). In any case, if we're going to use digitally signed data, we have to contend with the problem of compromise of the central signing key(s?). Someone who stole such a key would be able to generate as many fake IDs as possible. Naturally, you'd want to store the signing key in a hardware security module, which substantially reduces the window of vulnerability, since you would have to steal the module, not just the key. Still, that could happen.

Obviously, from the moment when the key is stolen, you can't trust any signatures which are generated with that key. On the other hand, the signatures generated before the key is stolen are just fine. The problem is distinguishing the two. It's natural to just publish the compromise and state that "any signatures after this data are bogus", but as Jacob Davies observes, once you control the signing key, you can make it generate any signature you want, including one over dates in the past. [Technical note: if the keys are embedded in an HSM, then you could program the module to always put the correct time in its signatures. However, attackers could potentially extract the key, so you can't really trust this level of guarantee.]

Dealing With Key Compromise
Key compromise is a standard problem with digital signatures and has a standard solution: you have some timestamping service which records when signatures were produced, thus differentiating valid from invalid signatures. The time stamping service can just be a separate signing system or (more securely) a hash chaining system such as that used by Haber-Stornetta timestamping [*].

In a system like this, each signature (i.e., identification card) is linked together in a chain of hashes. When document i is signed, producing S_i, we compute H_i = Hash(H_{i-1} || S_i). Because the hashes are one-way, once we know any value H_j, we can verify that any prior value H_i is correct, provided that we know the elements i..j. The way such a system is used in practice is that the timestamping service (in this case the federal government) periodically commits to given hash values, for instance by publishing them in the federal register. From that point onward, it's not possible to create signatures with timestamps prior to the published value, even if the key is compromised, since they don't appear in the chain of hashes implied by the published hash value.

Eliminating Signatures
It should be obvious at this point that it's possible to dispense with signatures entirely: simply use the published hash chain values to verify each document directly. In order for this to work well, however, the verifier needs to have access to the hash chain value for every signed document, which is excessively expensive in terms of data storage on the card. Conveniently, there are techniques for providing a more compact representation.

The most natural approach is to use a Merkle Hash Tree. Hash trees work kind of like a hash chain but they allow us to compress the verification information for a large number of entries into log(N) values. Say, for instance, we want to verify 1000 documents, we would need to store approximately 10 entries (the sibling nodes on the path from each entry to the root of the tree) on each card. The idea here would be that we would form a hash tree from all the identity cards produced in a given day and then use the hash tree root as the input into the hash chaining process. This would give us a mechanism for verifying any given card without any digital signatures at all, provided that we can obtain each hash chain value. Note that you just need a trusted path for the most recent hash chain value: intermediate values are verifiable from the trusted value.

Hybrid Systems
While this system is very secure against history rewriting due to key compromise, it's not very timely. You can't verify any identifier till you've seen a published hash value that post-dates it. This means that there is a direct tradeoff between the amount of data you need to ship around and how long it takes before an identifier becomes verifiable.

We can get past this problem with a hybrid system: digitally sign the root of each hash tree and then use hash chaining to link them together. Any identifier which claims to be older than the latest known hash chain value is directly verifiable without the signature. Any identifier which claims to be newer is verifiable by checking the signature. This means that once you discover a compromised key, it's only usable during the time period (weeks? months?) before the next hash chain value is published. Note that this isn't the time period during which an attacker can sign with the key but rather the time period during which one can verify signatures made with it. As soon as a relying party has a copy of the next hash chain value, they will see that the signature in question isn't incorporated into it (this also provides a tripwire for surreptitiously compromised keys).

Obviously this isn't the only design, but it's one that's reasonably well suited to the environment with a single issuer who issues a really large number of credentials and relying parties who only need to verify fairly infrequently.

1.Note that if we want the cards to be usable without a scanner, then we need ordinary physical authentication and tamperproofing features of the kind used for drivers licenses, passports, etc, but this is orthogonal to the digital security features. Really, you have two separate identification devices in the same form factor, with the digital one being much stronger—if you can read the digital identifier all the security features on the piece of plastic are redundant.


May 1, 2010

Part of the Democratic immigration plan is to require every American worker to have some kind of biometric identification [*].
The proposal is one of the biggest differences between the newest immigration reform proposal and legislation crafted by late Sen. Edward Kennedy (D-Mass.) and Sen. John McCain (R-Ariz.).

The national ID program would be titled the Believe System, an acronym for Biometric Enrollment, Locally stored Information and Electronic Verification of Employment.

It would require all workers across the nation to carry a card with a digital encryption key that would have to match work authorization databases.

"The cardholder's identity will be verified by matching the biometric identifier stored within the microprocessing chip on the card to the identifier provided by the cardholder that shall be read by the scanner used by the employer," states the Democratic legislative proposal.


"The biometric identification card is a critical element here," Durbin said. "For a long time it was resisted by many groups, but now we live in a world where we take off our shoes at the airport and pull out our identification.

The usual privacy groups are upset about this. I'm not sure how thrilled I am about it either, but that angle seems played out. Right now, I'm more interested in the security issues.

First, the American system of personal identification is far weaker than would be required to really support this strong a system of identification. The only real personal identification most US citizens have is a birth certificate (if they can find it; I don't know where mine is) and a driver's license. This is reflected in the proof of right to work requirements: All that's required to get a US passport (sufficient for proof to work in the US) is a birth certificate and some form of personal identification (e.g., a driver's license). And at least in California all you need to get a driver's license is a birth certificate, so basically all you need is a birth certificate. Similarly, a social security card (trivially forgeable) and a driver's license are sufficient to establish the right to work. Any new identification system like the one proposed here would need to be based on the same shakey foundation. It's not clear that there's a lot of point in requiring this strong a piece of identification (fingerprints, etc.) when we have this weak a notion of people's identity to start with.

Second, the system as described seems incredibly inconvenient, given that it effectively mandates that every employer in the country have some new scanner that can be used to verify the user's fingerprint.1 That seems like it's going to have a huge scalability problem. It's not clear how this is going to work in practice, either: is the scanner going to actually check the user's fingerprint (lots of opportunities for false rejects here), display the fingerprint and require employers to check it (you've gotta be kidding me, right?), or send the fingerprint back to Washington where it can be checked centrally. This last seems like the most practical option.

Regardless, I have two simpler approaches: the first preserves the personal identity check but with much less infrastructure. We replace social security cards and SSNs with a new, longer, identifier (18 digits should be plenty long).2 These numbers are effectively unguessable, but the US government maintains a central database that matches them to pictures (this database can be generated the same way as we were planning to establish the system described above). When you go to hire a new employee, you ask for their card (actually the number is enough) and type their number and your own TIN into https://www.identitycheck.gov/. The site shows their picture and you just compare it against the person in front of you. This creates a record in the database of the check, which establishes that you have done the check and provides a secure mechanism for delivering the (customary) biometric without the need for any new technical infrastructure at the vast majority of employers.

Really, though, we could probably dispense with the biometric entirely. As long as we have an entry in some national database of everyone who is allegedly working [keyed by SSN] and what job they hold (including when they started and stopped and some limited information about what hours they work) it should be possible to data mine the database for multiple SSNs and catch most cases of people not authorized to work, since each will have to present some legitimate number, and most numbers which can be used already are in use by other people working other jobs.

So, I'm not sure why this seems like a good idea to Durbin et al. Rather, it just seems like the more general misplaced faith that people seem to have in positive identification as panacea.

1. And what's with the microchip? All you need here is a digital signature, which doesn't require any kind of chipcard. If Congress wants a system like this, they should probably let professionals design it rather than trying to specify every detail.
2. The idea behind the longer identifier is to make it prohibitive to try random identifiers and get people's actual data. We just need a long enough identifier that most random values are invalid. If we capture the requester's TIN, then any significant number of bogus identifier requests points directly to this kind of fishing expedition. Really 18 digits is probably too long, but since 9 digit SSNs are already too small, we might as well buy ourselves some room.