EKR: August 2011 Archives


August 31, 2011

Today I had occasion to ask Web Security expert Adam Barth about avoiding XSS vulnerabilities. He was kind enough to help me out and (since he's presumably tired of answering this kind of question) also to write up some guidance for the rest of the world. His answer is below:

Folks often ask me how to build web applications without cross-site scripting (XSS) vulnerabilities, but I haven't really been able to find a reference that I'd be happy recommending. There are, of course, many different approaches you can use to build an XSS-free web application. This guide recommends a simplistic approach that works well for "single-document" web applications (like Gmail and Twitter) that use a single HTML document for the lifetime of the application.

Because complexity is the enemy of security, we approach the problem of eliminating XSS by simplifying how we handle untrusted data (by which we mean any information we retrieve from the network or from the DOM):

  1. Untrusted data MUST NOT be transmitted in the same HTTP responses as HTML or JavaScript. In particular, the main HTML document SHOULD be static (and therefore cacheable for a long time).

  2. When transmitted from the server to the client, untrusted data MUST be properly encoded in JSON format and the HTTP response MUST have a Content-Type of application/json.

  3. When introduced into the DOM, untrusted data MUST be introduced using one of the following APIs:
    • Node.textContent
    • document.createTextNode
    • Element.setAttribute (second parameter only)

That's it. If you follow those three rules, you stand a good chance of avoiding XSS. These rules are conservative. You can certainly build a secure web site that violates one or more of these rules. Conversely, these rules don't guarantee success. For example, they don't stop you from doing some dumb things:

// The textContent of a <script> element is active content.
var scriptElement = document.createElement('script');
scriptElement.textContent = userData.firstName;  // XSS!

// Some attributes (mostly event handlers) are active content:
var imageElement = document.createElement('img');
imageElement.setAttribute('onload', userData.lastName);  // XSS!
imageElement.src = 'http://example.com/logo.png';

However, common sense should help you avoid those situations.


August 30, 2011

Mrs. G and I watched Watchmen last night. I don't think this is a film I would have really wanted to see without having read the graphic novel, but it's interesting to see how they've adapted it. Like many of the recent films based on comic book graphic novel "classics" (and indeed director Zack Snyder's previous 300) it is visually and thematically a reasonably faithful adaptation. Mostly, it works less well in this format: elements that in the novel are like woah heavy feel more like heavy-handed when transferred to film. The one big exception here is Rorschach's journal: the same inner dialogue that feels gritty when printed on the page comes off totally differently when turned into a pompous Blade Runner-style voiceover that really hammers home what a complete psycho Rorschach is.

Spoilers below. Though, really, if you haven't read Watchmen yet, you're probably not going to and so they're not much in the way of spoilers.


August 29, 2011

One of the "poster child" applications for research into privacy-preserving cryptography has been electronic tolling (i.e., for highways and bridges). Tolling is an attractive application for a number of reasons, including:
  • There are some really serious privacy implications to knowing where where your car is, or at least people think there are. (See for instance the IETF's Geographic Location Privacy WG).
  • The kinds of infrastructure that you would need (transponders, receivers at the toll plaza, etc.) to implement are already in place. You would of course need to upgrade them, but it's not like people would find it hard to understand "we're sending you a new E-ZPass transponder. Stick it to your windshield".
  • You can hide most of the complexity in the transponder, so it's not like the users need to know how to execute blind signatures or cut-and-choose protocols.
  • A lot of money changes hands.
Most importantly, the existing situation is stunningly bad, both from a privacy and a security perspective. To take an example, Fastrak transponders are simple RFID tags, so it's trivial to clone someone's transponder as well as to place your own Fastrak readers to track people at locations of your choice. It's clear you could do a lot better than this, even without getting into any fancy cryptography, and with some cleverness, you can do much better. There has been a huge amount of research on privacy-preserving tolling over the years, but the basic idea is always to ensure that people pay tolls when appropriate but also to avoid anyone—including the toll authority—from determining through the protocol when given cars passed a given location. How achievable this goal is depends on the model: it's pretty well understood how to do this when the toll plazas are in fixed locations; it's rather harder to do when you have large expanses of toll roads and you want people to pay by the mile.

Against this background, last Thursday's NYT Article on E-ZPass toll fraud makes sobering reading. Briefly, E-ZPass can operate in a "gateless" mode where you drive through the plaza but there's nothing to stop you if you don't pay. Instead, there are license plate cameras and so if someone doesn't pay you can send them a ticket in the mail. (Note that the transponders aren't that reliable, so in some cases you keep a record of which license plates are registered to have a transponder and just bill people with registered plates but whose transponders didn't register as if the transponder had worked.) In any case, according to this article roughly 2% of people don't pay and the enforcement procedures are proving to be quite problematic. The problem isn't identifying the offender, it's billing them:

The process for trying to catch toll cheats begins with a photograph, automatically taken at the toll plaza, that is used to identify the offending vehicle's license plate number. Using motor vehicle records, the Port Authority then tracks down the vehicle owner and sends a letter indicating that a toll and possibly a fine are due.


"That is one of the great untold secrets for any given agency," Neil Gray, the director of government affairs for the Washington-based International Bridge, Tunnel and Turnpike Association, said about toll cheats. "You'll probably spend more time and money chasing the toll than you will get for the toll."


Then it comes down to how much time and what resources states want to invest to chase down these funds. In Maine, officials are able to suspend the registration of vehicles with unpaid E-ZPass bills; in Delaware, drivers with outstanding toll violations cannot renew their registrations. Jennifer Cohan, director of Delaware's Division of Motor Vehicles, acknowledged that there were harsher measures that could be employed.

"We technically could arrest these folks," Ms. Cohan said, suggesting that it was possible to have a police officer at every toll booth. "But our law enforcement officers are extremely busy."

What lessons does this have for more complicated cryptographic tolling mechanisms? First, the fact that the rate at which non-subscribers go through the toll plaza without paying is so high and that it needs to be enforced with cameras sort of negates concerns about the privacy of the tolling system itself. It doesn't really help to have a privacy-preserving cryptographic tolling protocols if you need cameras everywhere to detect fraud. And since toll plazas tend to be placed at choke points like bridges, tunnels, etc., there's a lot of information leakage. There's been a fair amount of work on tolling that doesn't use fixed toll plazas (e.g., where you pay by mile of road driven) and then uses secret cameras for auditing (see, for instance, Meiklejohn, Mowery, Checkoway, and Shacham's The Phantom Tollbooth) in this year's USENIX Security), but it's not clear how useful these models are. First, you still need a fair amount of surveillance (and hence information leakage) in order to enforce compliance. Second, tolls get collected at choke points not only because it's easy but also because those are the limited resource you want to control access to, so just charging people for miles driven on a large number of roads isn't an adequate substitute. (And of course in the case where you want to charge people for all miles driven, it's easier to just install mileage meters and/or charge a gas tax scaled to your car's expected MPG).

Second, a level of fraud this high suggests that concerns about the technical security of the system are premature. If you have photographic proof of 2% of people passing through the toll plaza and just outright not paying and you can't even manage to punish them and/or collect money from them, then you've got bigger things to worry about than fancy technical attacks. So, for instance, Meiklejohn et al. describe an attack on previous systems in which drivers collude to discover the locations of secret cameras and use that to defraud the tolling authority. It's a clever attack but kind of pointless if it's easier to just not pay entirely and figure you won't get caught.

More generally, I think this represents an argument against a broad variety of privacy-preserving cryptographic mechanisms based on this style of "voluntary" compliance enforced by auditing and punishment. The argument for this strategy goes that it allows you to (mostly) preserve people's privacy because the vast majority of transactions go unexamined while ensuring compliance because people are afraid of being caught if they cheat. The first half of this argument is fine as long as you can design an auditing mechanism which itself isn't too invasive. However, it's the second half of the argument that seems really problematic: if the value to me of cheating is V and the chance of getting caught is α, then the punishment P must be ≥ V/α or I'd be better off cheating and taking the occasional punishment. If either P or &alpha is too small, then the system won't work. So, here we have an instance where this has actually been tried, and the state has the capacity to inflict quite high punishments (including putting you in jail), and it's not working very well.

Many of the settings that people talk about using privacy-preserving cryptography in (e.g., digital payments) have weaker enforcement mechanisms and much more ambiguous evidence of cheating. For instance, the transaction itself might not be that well tied to your real-world identity, making punishment difficult. Moreover, often these protocols are complicated and involve a lot of fancy cryptography, so even if you do get caught you can argue that it was an inadvertant error and so you shouldn't receive the whole punishment. If we can't even make this stuff work in the current simple setting, it seems pretty questionable that it will work in more complicated cases.


August 28, 2011

The (arguably counterintuitive) syntax for interpolating templates into other templates using jqtpl and Express is:

  {{partial(<variables>) "<inner-template>" }}

Concretely, if we have template X and we want to interpolate template Y with variable foo = "bar", then template X contains

  {{partial({foo:"bar"}) "Y" }}

This is probably not of interest to you, unless you just spent 10 minutes trying to work this out from the sparse (and I think in this case wrong; bug reported) documentation.


August 23, 2011

On most of my recent flights I've noticed that the TSA isn't even using whole body imagers—they just have them roped off and send people through standard magnetometers. However, this weekend I flew back from Kauai through Lihue Airport (LIH) and they actually had their Rapiscan imagers active. However, as before they have a magnetometer line and a whole body scanner line and I was able to just select the magnetomer line. I was a little worried that because that was the longer line I would get redirected to the Rapiscan, but that never happened.

Of course after that, I had to get my back secondary checked because I had left a 6+ oz bottle of sunscreen in my bag. But it still didn't include any groping.


August 22, 2011

The process of turning raw wool into fabric by hand is extremely time consuming. Prior to the Industrial Revolution, the production process operated in a pyramid, with a large number of carders supported a smaller number of spinners, supporting an even smaller number of weavers [Note: weaving is much faster than the other two major technqiues for turning yarn into cloth: knitting and crocheting]. I've heard varying numbers, but Wikipedia claims that the ratio was around 9:3:1.

Isn't it interesting, then, that when you look at the list of common American surnames, which are often associated with occupations, that "Weaver" appears at position 190 (.05% of the population) but "Spinner" appears at 1/50th the rate, at position 7393 (.001%). Carder is at 4255 (.003%); Carter is, I would assume, a different profession. [The first 10 names, btw are: Smith, Johnson, Williams, Jones, Brown, Davis, Miller, Wilson, Moore, Taylor].

I'm not attempting to claim that there's some direct relationship between last name frequency and historical occupation rates, but it's still entertaining to speculate on the cause. My initial suggestion was that carding and spinning were more likely to be women's work and of course in the West women's surnames don't get propagated. Mrs. Guesswork suggests that spinning and carding weren't professionalized the way that weaving was [prior to the invention of the spinning wheel, spinning technology was extremely low-tech], so you might spin or card in your spare time, but weaving requires enough capital equipment that you would expect it to be done professionally and thus be more likely to get a surname attached to it.

Equally likely, of course, is that it's just coincidence, but what fun would that be?


August 21, 2011

I recently upgraded to Lion. Probably it would have been wiser to wait until 10.7.1 but I wanted the PDF "signing" feature so I went ahead anyway. Overall, things went really smoothly. Notes below.

  • As everyone now knows, by default Lion is an app store only purchase and doesn't come with any kind of media. Apple charges $50 extra for a USB stick, but as I imagine most everyone knows, if you dig around in the install package, there is a .dmg file and you can burn that to DVD. [*] This seems advisable but takes some time.
  • The actual install was pretty fast, say about 30 minutes or so on my Macbook Air.
  • However, Lion comes with a new version of Filevault which encrypts the whole drive. Turning this on took a while to encrypt the disk but that can happen in the background, so it's not too bad as long as you're happy to have the machine on for a while as it happens, or you're willing to have it happen over a few days. Apple offers to let you store the recovery key with them. I declined this offer.
  • New Filevault works just fine with old Filevault, so if you've been using old Filevault you can transition easily. Every so often you get asked if you'd like to unencrypt your old partition, but Lion doesn't make you.
  • However, if you want Time Machine to work well rather than badly—which is how it has historically worked with Filevault—you need to move to new Filevault, which means encrypted backups. There is a setting in Time Machine to encrypt your backup disk. This takes a very very long time if you have an existing non-encrypted backup disk.
  • Installing Lion blows away your existing—or at least, my existing—copy of Xcode 3. Xcode 4 is now free, but this means that you will be without debugger, compiler, etc. until you download another 3GB worth of Xcode, so something to keep in mind. There's probably some way to patch in the old Xcode but this seemed inadvisable.
  • I'm mixed on the new gesture support. Obviously, I want the old scrolling behavior, not the "natural" scrolling behavior (where the scrolling goes in the direction of your finger like with the iPhone), but that's easily turned off. Mission control seems like it should be really cool, but other than periodically swiping to see it happen, I haven't figured out what it's for. (Incidentally, just this gesture stopped working on my magic trackpad, but not my built-in trackpad, necessitating a call to Apple support. Strangely, changing it from three fingers to two and back again solved the problem.)
  • The UI changes are all pretty subtle. I can take or leave the auto-scrollbars and the rounded dialogs, buttons, etc. seem fine. Probably the most noticeable change is the way that apps keep their state. I'm used to using Command-Q to quit the app, but now this means that if I quit Preview, and then restart it I end up with all the same documents I had before, so I'm not sure this is that great. I guess I just need to learn to use Option-Command-Q or reset the defaults using the command line.

Anyway, this was all pretty smooth for a major OS upgrade (I really appreciate not having to run mergemaster). And the feature where you can take a photo of your signature and embed it into documents really is pretty cool. I may never need to print-scan-sign-scan-email again.


August 9, 2011

My EVT 2011 rump session talk, on the future of Internet Voting, is now available here. And in response to the people who ask about my cat's political leanings? She's in favor of Proposition C legalizing medical catnip.

UPDATE: Temporary subversion glitch makes file unavailable. Will have it back online soon.

UPDATE: Fixed.


August 3, 2011

Recently I had the dubious pleasure of working simultaneously in C++, Python and JavaScript. I'm not saying that there is anything wrong with any of these languages, but if you rapidly switch back and forth between them (as, for instance, when you're developing a JavaScript Web 2.0 application with the server in Django and the front end), things can get pretty confused. The P90x guys claim that muscle confusion leads to increased strength, but in my experience, programming language confusion mostly leads to problems.

The basic problem is that these languages have fairly similar syntaxes and so it's pretty easy to inadvertantly use the syntax of language A with language B. Here's a sampling of some common tasks in these languages:

Statement Separation; terminatednew line; semicolon separated (optional)
Length of an array v.size() (for STL vectors) len(v) v.length
Append to an array v.push_back(x) v.append(x) v.push(x)
Iterate through an array for(size_t i; i<v.size(); i++) for(x in v) for(var i=0; i<v.length; i++)
Is an element in an associative array? if (a.count(k)) if k in a if (a[k])
Creating a new object new ClassName() ClassName() new Constructor()

To make matters worse, sometimes what you would do in one language is syntactically valid, but undesirable, in another language and invalid in a third. For instance, in Python you don't use semicolons to terminate statements at all, but it doesn't choke if you use them. In JavaScript, you mostly need them and JS will "insert" them as needed [*]. In C++, semicolons are required and if you don't add them, the compiler will throw an error. So, if you switch back and forth, you're constantly adding spurious semicolons in Python code (which makes Python people sad) and omitting them in C++.

To take a more irritating example, consider asking if an element is an associative array. This is actually a lot more painful: if you do if (a[k]) on a C++ map, as is natural in JavaScript, C++ will automatically instantiate a new copy of whatever's in the array using the default constructor and place it at location k. This can have really undesirable consequences, since you've just modified the structure you mean to be examining. On the other hand, if you dereference a nonexistent location, it throws a KeyError, like so:

>>> a = {}
>>> a['b']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'b'

Unfortunately, this is a runtime error, so it's easy not to notice this mistake until you get fairly far into the program, especially if you're just checking for existence as a corner case where the key is usually present.

Iterating through an array is another case where it's easy to mess up. In JavaScript, like C++, you iterate through an array by using an index from 0 to array length - 1. In Python, however, you can iterate through it more easily just by using for x in v where v is the array. Unfortunately, a similar construct is syntactically legal in JavaScript, but the result isn't what you want:

> a = ['a', 'b', 'c']
["a", "b", "c"]
> for (x in a) { console.log(x); }

Instead of iterating through the array, we're iterating through the array indices. Again, this is syntactically valid code, it just doesn't do what you want (though perhaps you might want to use this as an alternative to the familiar for (i=0; i<a.length; i++) idiom.) Unfortunately, if you don't test your code carefully, it might not be something you noticed. To make matters more confusing, the for (x in a) idiom works fine for another common data structure enumeration task: enumerating the keys in a map in both Python and JavaScript.

None of this is intended to be a criticism of the syntax of any language, the problem is the conflict between the syntax of each language. This is particularly troublesome for Web applications because the client side more or less must be written in JavaScript but the most popular Web frameworks are written in either Python (Django) or Ruby (RoR), so it's common to have to work in two languages at once. I don't even want to think about what happens if you have to work in a framework that uses Java, a language, which, despite it's name, is not really related to JavaScript, though they have confusingly similar syntax. This is one advantage of Node, a server for developing Web (or any other server) application in JavaScript. Since the client side being written in JavaScript is a fixed point, Node allows you to write your entire system in JavaScript. Of course, depending on your opinion of JavaScript, that may seem like a distinctly mixed blessing.


August 1, 2011

For the third year in a row, this year's EVT/WOTE includes a Rump Session. The session, is now scheduled at the later time of 5:30 PM on August 8, which gives you more time to think of your proposal.

This session will no doubt include results more important, and simultaneously more hilarious than any to be presented at the main workshop, which is full of the usual solid (i.e., boring) academic research. However, this can only happen if you (yes, this means you!) submit.

Acceptable topics include:

  • Work in progress
  • Work which you haven't had time to start
  • Work which you will do if you ever get some free time
  • Work which should not be started at all

No topic is too big. No topic is too small. No work is too stupid to be presented at the EVT Rump Session.

Each presenter will have between 4 and 7 minutes, depending on the number of submissions, the program chair's expectation of hilarity, and an as-yet-undetermined and most likely arbitrary evaluation formula.

Submissions should be directed to the Rump Session Chair, Eric Rescorla (ekr@rtfm.com). Please provide a talk title, name of the presenter, and an estimate of how much time you would like.