COMSEC: May 2011 Archives


May 25, 2011

As I wrote a while back, NSTIC feels like requirements written with a solution already in mind, specifically what's called a "federated identity" system. The basic idea behind a federated identity system is that a person who wants to do stuff on the Internet (i.e., you) will have a relationship with some identity provider (IdP). At minimum, you'll share some set of authentication credentials with the IdP, so that you're able to establish to them that you're sitting at a given computer. Then, when you go to an arbitrary site on the Internet and want to authenticate, that site (called the relying party) can verify your identity via the IdP. What's nice (at least theoretically) about a system like this is that you can just authenticate once to the IdP and then all other authentication transactions are handled more or less seamlessly via the IdP. (This is often called single sign-on (SSO)). What makes the system federated is that there is expected to be more than one IdP and so I might be able to get my identity through the USPS and you get yours through Wells Fargo, but Amazon trusts both so we're both able to get our $0.59 stoneware bowls via Amazon Prime.

This isn't by any means a new idea. In the US your driver's license is issued by your state, but it's accepted as identification by a whole variety of relying parties, not just the state and its agents. It's not just your driver's license, either; you likely have a relationship with multiple IdPs. For instance, I have a driver's license (issued by California) and a passport (issued by the US State Department), plus a variety of more specialized credentials which are arguably used for authentication/authorization, such as credit cards, library cards, etc. That's not really the state of play for Internet transactions, though, with the partial exception of Facebook Connect (discussed below).

Simple SSO
In the simplest case, all the IdP does is attest to the fact that you have an account with them and maybe some weak indication of your identity, such as your claimed display name and an account identifier. There are a variety of systems of this type, such as OpenID, but by far the most popular is Facebook Connect. The way that Facebook Connect works is that you log into Facebook and then when you visit a Facebook Connect relying party, they are able to get your account information from Facebook, though that mostly consists of information you've given Facebook, so there's no real guarantee that for instance the name that Facebook delivers the relying party is your real name.

Even a simple system like this presents technical challenges, especially when it's implemented in the extremely limited Web environment (hopefully more on this in a separate post). First, there's the question of privacy: just because I have an account with IdP X and site Y is a relying party for IdP X doesn't mean that when I visit site Y I want them to be able to identify me. The more widely accepted an IdP is, the more serious this problem becomes; if every site in the world accepts my IdP, then potentially I can be tracked everywhere I visit, both by the IdP and by the relying parties. If we want to avoid creating a universal tracking mechanism (people often call this a supercookie), we need to somehow let users decide whether to authenticate themselves to relying parties via their IdPs. This creates obvious UI challenges, especially because there's plenty of evidence that users get fixated on whatever task they are trying to accomplish and tend to just click through whatever dialogs are required to achieve that objective.

The second challenge is dealing with multiple IdPs: in an environment where there are a lot of different IdPs and different relying parties accept different IdPs, then we need some mechanism to let relying parties discover which IdP (or IdPs) I have an account with. This is actually a little trickier than it sounds to do in a privacy-preserving way because the profile of which IdPs I support can itself be used as a user-specific fingerprint even if none of the IdPs directly discloses my identity to the relying party. Moreover, when I have a pile of different IdPs, I need to somehow select which IdP to authenticate with, which means more UI.

Real-Life Attributes and Minimal Disclosure
Once you get past a system that just carries an identifier—especially one that's unique but not really tied to any form of verifiable identity—life gets more complicated. Consider your driver's license, which typically has your name, address, age, picture, and driver's license number. When I go to a bar to buy a drink, all I need is to demonstrate that I'm over 18, but there's no reason for the bar to know my real name, let alone my address. In general, the more attributes an identity system can prove, the more useful it is, but also the more of a privacy threat it potentially is if any relying party just learns everything about me.

There has been a lot of work on cryptographic systems designed to allow people to prove individual properties to relying parties without revealing information that the relying party doesn't need to see (the term here is "minimal disclosure"), such as Microsoft's U-Prove. The idea here is that you will establish a whole bunch of attributes about yourself to one or more "claims providers". You can individually prove these claims to relying parties without revealing information about other claims. In the best case, even claims providers/IdPs don't get to know which relying parties you are proving claims to or which claims you are proving (e.g., the state wouldn't get to find out that you were proving your age to a bar.)

Making this work requires a lot of cryptography, though at this point that's pretty well understood. However, it also requires user interaction to allow users to determine which claims are to be proven to which relying parties. So, for instance, you would visit a site and it would somehow tell your Web browser which claims it wanted you to prove; your browser would produce some UI to let you agree; once you've agreed, your browser would cooperate with the claims provider to prove those claims to the relying party.1

Putting it Together
Putting the pieces together here, what NSTIC seems to envision is a federated identity system of users, IdPs, claims providers, and relying parties. As a user you'd be able to select your IdP/claims providers and as a relying party you'd be able to decide which of these you trust. The whole system would be glued together with by privacy-preserving cryptographic protocols. In the next post, I'll try to explain some of the challenges of actually building a system like this in the Web environment.

1. It's worth noting that if you don't mind your IdP/claims provider learning who you authenticate to and which claims you prove, then you don't need any crypto magic. This is basically the kind of system Facebook Connect already is.


May 4, 2011

If everyone loved passwords, then we wouldn't be having an extended discussion about how to get rid of them (incidentally, I was at IIW this week, where the suckiness of passwords is a basic assumption.) So, what's there not to like?

The biggest problem with passwords as currently deployed is that they are replayable: in order for Alice to authenticate to her bank, she must provide her bank with her password. The unfortunate consequence of that is that once Alice has authenticated to her bank, then the bank can impersonate her in the future. This doesn't sound so bad, since it's not that useful for the bank to impersonate me to itself, but it has two very bad implications:

  • Phishing: If some attacker can convince me that they are my bank, and I give them my password, then they can impersonate me indefinitely, including to my bank. This sort of fraud is a huge issue for banks.
  • Unsafe Password Reuse: I'm not (overly) worried about my bank impersonating me to itself, but if I use the same password with two banks, then evil bank A might impersonate me to good bank B. More generally, any time I use the same password at two different sites, then I have to worry about whether I trust both those sites. This is what motivates the advice people usually get to use a different password at each site.

It turns out that there are technical mechanisms for alleviating these issues. The basic principle is to arrange that the merchant never gets to see a replayable password. The technology is complicated and there are a bunch of different mechanisms, but the basic idea is that when Alice establishes her account she gives the site some numeric verifier (V). Then when she comes back, she types her password into her browser which can then prove to the server that she knows V without ever giving the server a copy of V. PwdHash is one example of such a system, as are PAKE-based systems.

Password Proliferation
As Constant observes, it's probably not necessary to worry about Slashdot stealing your password and using it to impersonate you to Kayak, but even people without a lot of commercial relationships tend to have fair number of accounts that they probably don't treat interchangeably. This is particularly difficult when those accounts span a spectrum of security. Consider the following accounts ranked somewhat in increasing order of sensitivity:

  • Slashdot
  • Twitter
  • Gmail
  • Amazon
  • Bank of America
  • Morgan Stanley

I think there is a pretty fair argument that each of these represents a distinct level of security. I don't much care whether people post as me on Slashdot, but I probably do as Twitter. Unlike Twitter, I have actual private information on Gmail but there's actual money involved at Amazon but less than one the table for my bank account, and perhaps less than what I have for my investment portfolio at Morgan Stanley (Note: these providers do not necessarily represent my actual accounts.) Since these exist at different levels of security, they should have different passwords. Moreover, at the highest levels, I most likely want the use different credentials for each site. The end result of this is that I need to have (and likely remember, see below) a whole pile of passwords. This is not something that people like.

Backward Compatibility
Although we know how to build password-based systems that don't reveal the user's password to the relying party, we don't really know how to deploy them securely. The basic problem is that users are already prepared to type their passwords into Web forms that give the password to the server. It's not at all clear how to construct a UI that the user can be sure is safe and thus can type their password into and that also can't be imitated by a malicious Web site. [Technical note: it's easy to build UI that can't be imitated precisely, but the test is whether users will be fooled by bad imitations.] (More about this issue be found here.)

Low Entropy Space
A well-known problem with passwords is that they generally have a very low entropy level, which is to say that your average user draws their password from a relatively small number of passwords. This means, that if I have some oracle which will tell me whether a given candidate password is valid (e.g., a list of encrypted passwords, a server which I can try to log into, etc.) it doesn't take as many attempts as one would like to try the most probable candidate passwords. This is generally called dictionary attack.

The low entropy of the passwords isn't actually quite as bad as it sounds: even though users generally choose terrible passwords, in order to check a candidate password you generally need to try to log into the site in question, which affords the site the ability to do velocity checks and/or limited-try capabilities, such as locking your account after some fixed number of login failures. However, even then you need to use a password with a certain minimum level of security; if I use "ekr" as my username, then this is going to be a lot of attacker's first guess, so I need to get far enough up the entropy curve to make this kind of attack infeasible. That said, low entropy passwords significantly weaken the guarantees of using password diversification technologies like PwdHash, since it's comparatively easy for an attacker who has a verifier to extract the original password.

This brings us back to the memorability problem. Every new password is something else to remember, and (loosely) the higher the entropy of the passwords, the harder they are to remember.In the limit, if I have a randomly generated password for each site, I'm pretty much going to need some password manager to remember them (either that or a big pile of of post-it notes).1

Aside from the drawbacks listed above, passwords are inherently a 1-1 mechanism. If I have relationships with five different banks, I need to have established a password—even if it's the same one—with each of them. This sort of entry barrier is a pain for users, but especially for new sites, which have trouble converting visitors to users because they first need to drive them through an annoying registration experience. So, passwords don't really permit any notion of delegating trust to someone else. This is also true in the inverse sense, where I can't easily give you permission to look at my bank balance without giving you permission to make funds transfers, at least not without the bank going to a lot of effort.

A related concern is that passwords don't really have any mechanism for establishing stuff about users outside the system. For instance, when I want to sign up for a credit card, the issuer really wants to know that it's me, but the best they can do is use the (not-really) secrecy of my social security number, address, etc. as a weak password. Once I've signed up with them, a password may be fine, but it's not a useful entry point into the relationship. Similar arguments apply for proving that I'm over 21 or that I live in a given state.

Next Up: What kind of architecture is NSTIC contemplating?
Hopefully the above gives you a sense of the sort of concerns that are motivating something like NSTIC. While formally NSTIC is written as a set of requirements, to my eyes it's more like one of those documents whose authors start with a given solution in mind and write the requirements around that. In the next post in this series I'll try to talk a little bit about that implicit architecture.

1. Some people use a sort of lame mental hash function to generate related but distinct passwords for each site, but this seems to involve a fair amount of mental overhead.


May 1, 2011

As I said earlier, a lot of the use cases used to motivate NSTIC are about the inadequacy of existing 1-1 authentication mechanisms. As you can imagine, a huge amount of research has gone into trying to figure out how to build a set of systems which don't have these drawbacks, but the results haven't been entirely satisfactory. The following is an attempt to briefly survey the space and why it's proven so difficult. To be honest, I'm getting a little tired of writing this kind of thing (an older attempt to do this in longer form for the non-Web context can be found here), but it has to get done if you're to make sense of the rest.

It's probably easiest to get a sense of what the problem is by looking at deficiencies in the existing password-type systems on the Web. As you all know, what happens now is that you go to some site—which, if you're lucky, uses HTTPS— and it gives you a form (i.e., text fields on the page) to enter your username and password. This user interface is completely under control of the Web site, and to a first order just looks like any other Web form to the browser1 You type that stuff in, hit return or click on the submit button, and the browser sends the username and password to the server, which verifies them and either lets you log in or not. [Technical note: each page you fetch/link you click on a site is sort of an independent transaction. The site uses web cookies to string the transactions to gether so you don't have to type your username/password on each page.] There's plenty to hate here, but before we talk about that, it's worth talking about the stuff that's good.

From the user's perspective one of the most important properties is portability. Say I buy a new machine or I want to use a kiosk somewhere: as long as I remember my password (this is a lot easier if I use the password monkey10 everywhere than if I generate a random 16-character password for each site), then I just sit down, type it in, and I'm good to go. Even if I have a really long password, I can write it down on a piece of paper which will survive the failure of any particular device. This sounds simple, but it's actually a feature that many of the proposed fixes for this problem don't have. To give you just one example, pretty much all the systems that involve you having a long-term client-side public key then require some way to haul that key around. There have been a lot of proposed answers to this (USB tokens, smartphones, etc.) but none of them have come close to taking off.

Backward Compatibility
Say you've just invented a really good remote authentication technique. What now? Well, if it involves modifying the client, then you've got a real problem since Web browsers turn over comparatively slowly (10% of the net is still running IE 6). So, even if you manage to convince all the browser manufacturers to put your system in, you're looking at years before you can count on everyone being able to authenticate with it, and hence before you can use it exclusively. Similar reasoning applies when you need to modify the server, since any new mechanism on the client is useless without server support. The only lowest common denominator mechanism is passwords through Web forms, which is why so many new authentication systems have been structured as enhancements to that basic mechanism, either on the server side (e.g., if you don't recognize this image, don't proceed) or on the client side (e.g., PwdHash) so that they can be deployed unilaterally.

Site Control of Look and Feel
Passwords in web forms aren't the only authentication mechanism that was intended when the Web was first designed. Indeed, HTTP supports not one but two password-based authentication mechanisms, "Basic" (i.e., passwords in the clear in the HTTP header) and "Digest" (i.e., challenge response in the HTTP header). Neither of these sees much usage, most likely due to the hideous user interface they typically have, which involves the browser bringing up a modal or semi-modal dialog as you first go to the page (generally before you see anything on the page). It turns out that this is not what site operators want, which is instead to control the UI experience, including offering you first-time registration without an annoying dialog box, password recovery, branding, etc. In other words, they want to brand it, and aren't really interested in authentication mechanisms which don't offer that ability.

Next Up: What's bad about passwords?
While passwords have some useful features, if they were really great then we wouldn't be having this discussion. Next, I'll talk about some of the obvious drawbacks, but as you're reading that you should remember that while annoying none of them have been severe enough to push us over the edge into actually discarding passwords for most applications.

1. The exception here is that there is a special indicator that tells the browser that the password field should display dots or stars or whatever instead of your real password.