EKR: September 2010 Archives


September 30, 2010

If you've ever attended a meeting at your local Silicon Valley company, you've no doubt had the opportunity to sign in in the lobby. At many such companies—Google, for one—you are given the opportunity to accept or decline an NDA. In either case, you get a machine-printed name label, but if you declined the NDA, it has some distinctive mark indicating that you're not to be exposed to anything confidential.

As usual, it's worth asking what the threat model is here. There seem to be two major cases:

  • To prevent employees from accidentally treating non-NDA visitors as if they were NDAed.
  • To prevent non-NDA visitors from impersonating visitors who have signed the NDA.

With respect to the first question, I should start by mentioning that in my experience these badges are almost universally ignored. I, of course, decline the NDA and yet (without naming any names), colleagues routinely take me into sensitive areas or just let me walk around on my own without any kind of supervision. Even if employees did pay attention to the status of visitor badge, the scope of the visitor NDA is so broad—and remember that when companies are really serious about confidentiality they have you sign a paper NDA—that it's hard to imagine your average employee wanting to reveal anything really confidential based on something you typed onto a console in the reception area So, even in a non-malicious environment, it's not clear that this sort of labelling is of much use in distinguishing people who have signed the NDA from those who haven't.

Now let's turn to the malicious case. These badges are just ordinary sticky name labels like you could buy at Office Depot. It's trivial for me to get my own label maker and produce any label I want, including one that indicates I've signed the NDA. The only trick is knowing what a valid label looks like, but seeing as any reasonable-sized company mints hundreds of these badges a day, a little dumpster diving around campus is likely to yield a valid labels. Alternately, you can just visit campus with a group of other people, decline the NDA, and hope that someone else doesn't so you can get a good look. In either case, you don't need to do a very good job, since, as I mentioned above, employees don't seem to do a very good job of checking. [I should note at this point that there's a very similar but a bit more sophisticated set of objections to the ubiquitous RFID employee badges, but that's a topic for another post.]

Finally, consider the threat from the visitor's perspective. To the extent to which you're expected to be bound by an NDA you signed in the lobby (and since "signed" means that you clicked on some check-box or hit return in response to a dialog box, it's unclear to what extent that is), and that you've presumably thrown away your badge, what stops the company from retroactively claiming that you executed the NDA even if you actually didn't? It's not clear whether this would really hold up in court, it's easy to claim that you weren't really paying attention and accidentally "signed", but that argument cuts against the value of the NDA to the company as well.


September 27, 2010

First, sorry about the light posting. Nothing's wrong, just an unusually bad case of work being busy plus writer's block.

Like all paranoid people, I run my Mac with screen locking. The other night I decided to lock the screen (I can't remember whether I used the hot corners or closed the lid) and then quickly changed my mind. The result was the state shown below:

Sorry the picture is a bit blurry, but basically it's the password dialog popup but with my desktop behind it rather than the expected black mask of the screen saver.

Obviously, this is a little distressing, so I tried hitting cancel, closing the lid to put the display to sleep, but nothing would restore it to the usual locked (i.e., black screen with just the password dialog condition). The best part, though, is that the screen itself wasn't totally locked—I was able to use the mouse to manipulate the windows hiding behind the model dialog. I wish I had thought to see if the keyboard input was locked into the dialog, but I forgot to check.

Needless to say, I just powered down the machine at this point (paranoid, remember?)


September 12, 2010

Everyone knows how to decide fairly simple questions with a coin flip: you flip, I call it in the air.1 Unfortunately, this simple technique doesn't work when we are trying to draw one winner out of a group of three (or one loser, the concept is the same).

A non-deterministic process
It's relatively easy to convince yourself that there's no process that's both fair and guaranteed to require a finite number of flips (Note: we're assuming here that the flips are i.i.d. with a .5 probability of heads or tails. If not, we need a different analysis.) Consider any number of flips n: the total number of possible outcomes is 2^{n} which is never evenly divisible by 3. Thus, any assignment of outcomes to winners must either be unfair (give more outcomes to some winner) or have some set of outcomes which is invalid, in the sense that the process fails and must be run again.

At Least One Solution
The above should tell you the general shape of the solution: we start by flipping a coin n times. For some subset of possible outcomes m we declare the process finished and declare a winner. For the remaining n-m outcomes we flip some additional number of times and we repeat until we have declared a winner. I don't have an optimal solution for the general case of any number of players, but for three there is an obvious solution that I also believe is optimal:

  1. Assign the players the numbers 0, 1, 2.
  2. Flip two coins.
  3. Interpret the results of the flip as a binary number with the first coin being the high digit. E.g., HT is binary 10 (i.e., 2).
  4. If the result is 3, go to step 2.
  5. The winner is the matching player. Exit.

This algorithm has a .75 chance of terminating after two flips and on average requires 2.667 flips. There are other equivalent two-flip algorithms, but obviously no algorithm that starts with three flips can be more efficient. I don't have an algorithm that sometimes terminates after a single flip, but my intuition is that there isn't one (I'll make an argument for this shortly).

Generalizing to N > 3
Obviously N = 4 is easy. We can just flip a coin twice. N = 5 is more complicated, as is any non power of 2. This feels like a coding theory problem at this point and my coding theory is rather rusty, but let's consider at least one avenue. Clearly, we need at least 3 bits to cover all possible winners, so we could just say that we use 3 bits and values 0-4 correspond to winners with values 5-7 discarded. This algorithm has a .625 chance of terminating after 3 flips, and on average requires 4.8 flips. This is obviously suboptimal since (unlike the N=3 case) we're discarding information contained in the difference between those three values. We can get a more efficient packing by always doing 4 flips, using the first 15 values and discarding the 16th. Now we're not throwing away information, and we have a .9375 probability of terminating after one round, and on average requires 4.267 flips.

There's a better solution, though. Build a prefix-free code, where we encode as many values as possible in the first three (values 0-4) but then instead of discarding values 5-7 we use them in combination with another bit. I.e. the following bit patterns:

Pattern   Winner
000            1 
001            2
010            3
011            4
100            5
1010           1
1011           2
1100           3
1101           4
1110           5
1111           Continue

This is really just the 4-coin mapping (with 16 values) but arranged in such a way that we ignore the final bit (coin) for some of the prefixes because both suffixes produce the same value, so we don't need to read the next bit.

This algorithm has a .625 probability of terminating after 3 coin flips and (same as before) a .9375 probability of terminating after 4 coin flips. Each round takes on average 3.333 flips, so unless I'm screwing up somewhere (and this is certainly possible) this algorithm uses you an average of 3.556 flips, which is far better.

It seems like one ought to be able to use a generalization of this strategy for any number of players that isn't evenly divisible, construct a minimal-length prefix-free code with a symbol set that's some multiple of the number of players, but my coding theory isn't really up to giving a more general algorithm.

1. It's interesting to note that this isn't actually fair. Say that coins are in general biased in favor of heads, then the person calling will always ask for heads and gain an advantage. The "call it in the air" technique is designed to deal only with the threat that the flipper is able to bias the result. To deal with biased coins you need to flip multiple coins. But if the coins are flipped sequentially you now need to deal with the possibility that the flipper can bias the coins. You could both flip one, in which case it's a little like rock/paper/scissors in terms of psychology.


September 11, 2010

David Gelernter writes
Maybe most important, you need a cloud for security. More and more of people's lives is going online. For security and privacy, I need the same sort of serious protection my information gets that my money gets in a bank. If I have money, I'm not going to shove it in a drawer under my bed and protect it with a shotgun or something like that. I'm just going to assume that there are institutions that I can trust, reasonably trustworthy to take care of the money for me. By the same token, I don't want to worry about the issues particularly with machines that are always on, that are always connected to the network, easy to break into. I don't want to manage the security on my machine. I don't want to worry about encryption; I don't want to worry about other techniques to frustrate thieves and spies. If my information is out on the cloud, not only can somebody else worry about encryption and coding it, not only can somebody else worry about barriers and logon protections, but going back to Linda and the idea of parallelism and a network server existing not on one machine, but being spread out on many, I'd like each line of text that I have to be spread out over a thousand computers, let's say, or over a million.

So, if I'm a hacker and I break into one computer, I may be able to read a vertical strip of a document or a photograph, which is meaningless in itself, and I have to break into another 999,999 computers to get the other strips.

This doesn't make a lot of sense to me.

First, most of what you see described now as "cloud"-type services, e.g., EC2 or Box.net, are really just big server farms operated by a single vendor. There's a reasonable debate about whether these are more or less secure than services you operate yourself. With something like Box.net, you don't need to do any of the admin work on the server, so you can't screw it up and leave your system insecure. On the minus side, you don't really know what the operator is doing, so maybe they're administering it more insecurely than you would yourself. Moreover, there's a certain level of risk from the fact that other people—maybe your enemies—are accessing the same computers as you and may be trying to steal your data. What this kind of cloud service is, mostly, is more convenient: managing your own systems is a huge pain in the ass, and so while in the best case you might manage them more securely than Amazon or Box would, in practice you probably won't1.

What this doesn't do, however, is remove a single point of failure. In fact, there are at least two:

  • Your data is stored at a small number of machines at the service provider site. Compromise of one of those machines will lead to compromise of your data, as will of course compromise of any of their management machines.
  • If the machine on your desk which you use to access the data is compromised, then your data will also be compromised.

You can, of course, remove the risk of compromise from the service provider side by encrypting all your data before storing it. In that case, you're left with the risk of compromise of your own machines but you now have to, as Gelernter says "worry about encryption". There's no real way to completely remove the risk of compromise of your own machines: after all, you need some way to view the data and that means that your machines need to be able to access it. At most you can minimize the risk by appropriate security measures.

It's clear, however, from Gelernter's discussion of having your data spread out over a million machines that he's talking about something different: a peer-to-peer system like Distributed Hash Table (DHT) where your data is sharded over a large number of machines operated by different people. In the limit, you could have a worldwide system where anyone could add their machine to the overlay network and just pick up a share of the data being stored by other people. In principle, this sounds like it removes the risk of a single point of failure, since you would need to compromise all the machines in question. In practice, it's not anywhere near so good, for two reasons. First, you're trusting a whole pile of other people who you don't know not to reveal/misuse whatever part of your data they're storing. That's not very comforting if the data in question is your social security number. So, if you're unlucky enough to have part of your data stored by your enemies, that's not good. Second, DHTs are designed to dynamically rebalance their load as machines join and leave the overlay. This means that it may be possible for an attacker to arrange that his hosts are the ones which get to store your data, which would increase the risk of compromise. Even in DHTs which don't dynamically rebalance, it's generally not practical to manage a distributed access control system across such an open network; instead it's just generally assumed that if you want your data to be confidential you will encrypt it.

This brings us to the suggestion that the data will be sharded in some way that makes each individual piece useless. This seems kind of pointless. First, it's not necessarily easy to have a generic function which breaks a data object into subsets each of which is useless. Gelernter gives the example of a vertical strip of a photo, but consider that a horizontal strip of an image of a document (or a vertical strip of a document in landscape mode) leaks a huge amount of information. I can imagine security arguments for other sharding mechanisms (every Nth byte, for instance), but there are also cases where they're not secure. Second, if you're encrypting the data anyway, then it doesn't matter how you break it up, since any subset is as useful (or useless) as any other.

The bottom line, then, is that cloud storage doesn't necessarily make things as much more secure or simpler as you would like: You still need to deal with encryption and with protecting your own computer. What cloud storage does is remove the need for you to operate and protect your own server. This adds a lot of flexibility (the ability to have your data available whatever machine you're using) without too much additional effort, but it's not much more secure than just carrying the data around on a laptop or USB stick.

One more thing: you don't really want to just shard the data. Say that you break each file up into 100 pieces and the node storing piece #57 crashes and loses your data. What happens? If your file is plain text, it might be recoverable, but with lots of file formats (e.g., XML), this kind of damage can render the entire file unusable without heroic recovery efforts. There are well-known techniques for addressing this situation (see forward error correction), but it's not just a simple matter of splitting the file into multiple parts.

1.Technical note: I'm talking mostly about full services like Box or Amazon S3. Outsourced virtual machine services like EC2 of course require you to manage them and so you can screw them up just as badly as you could screw up a machine in your own rack.


September 2, 2010

As everyone knows, cats hate productivity and when I work from home my cat likes to curl up somewhere next to my keyboard half obscuring the mousepad. This of course means that you can't really use your mouse. After repeatedly kicking her off my desk, I finally realized who was in charge and decided to go for a technical fix: the Apple Magic Trackpad. I generally like trackpads and it's tall enough that the cat doesn't want to sit on top of it. She just curls up right in front of it.

Everything was going fine until one day it started totally flipping out. The magic trackpads are multitouch with no separate button; to get a double click I use two fingers to control the mouse and click the trackpad with my thumb. Suddenly, though, I started getting double clicks all the time. I was all ready to reboot or call support and then I realize the problem: one finger and one paw counts as two fingers. A small adjustment later and everything was back to normal.