Will the cloud be more secure?

| Comments (1) | COMSEC SYSSEC
David Gelernter writes
Maybe most important, you need a cloud for security. More and more of people's lives is going online. For security and privacy, I need the same sort of serious protection my information gets that my money gets in a bank. If I have money, I'm not going to shove it in a drawer under my bed and protect it with a shotgun or something like that. I'm just going to assume that there are institutions that I can trust, reasonably trustworthy to take care of the money for me. By the same token, I don't want to worry about the issues particularly with machines that are always on, that are always connected to the network, easy to break into. I don't want to manage the security on my machine. I don't want to worry about encryption; I don't want to worry about other techniques to frustrate thieves and spies. If my information is out on the cloud, not only can somebody else worry about encryption and coding it, not only can somebody else worry about barriers and logon protections, but going back to Linda and the idea of parallelism and a network server existing not on one machine, but being spread out on many, I'd like each line of text that I have to be spread out over a thousand computers, let's say, or over a million.

So, if I'm a hacker and I break into one computer, I may be able to read a vertical strip of a document or a photograph, which is meaningless in itself, and I have to break into another 999,999 computers to get the other strips.

This doesn't make a lot of sense to me.

First, most of what you see described now as "cloud"-type services, e.g., EC2 or Box.net, are really just big server farms operated by a single vendor. There's a reasonable debate about whether these are more or less secure than services you operate yourself. With something like Box.net, you don't need to do any of the admin work on the server, so you can't screw it up and leave your system insecure. On the minus side, you don't really know what the operator is doing, so maybe they're administering it more insecurely than you would yourself. Moreover, there's a certain level of risk from the fact that other people—maybe your enemies—are accessing the same computers as you and may be trying to steal your data. What this kind of cloud service is, mostly, is more convenient: managing your own systems is a huge pain in the ass, and so while in the best case you might manage them more securely than Amazon or Box would, in practice you probably won't1.

What this doesn't do, however, is remove a single point of failure. In fact, there are at least two:

  • Your data is stored at a small number of machines at the service provider site. Compromise of one of those machines will lead to compromise of your data, as will of course compromise of any of their management machines.
  • If the machine on your desk which you use to access the data is compromised, then your data will also be compromised.

You can, of course, remove the risk of compromise from the service provider side by encrypting all your data before storing it. In that case, you're left with the risk of compromise of your own machines but you now have to, as Gelernter says "worry about encryption". There's no real way to completely remove the risk of compromise of your own machines: after all, you need some way to view the data and that means that your machines need to be able to access it. At most you can minimize the risk by appropriate security measures.

It's clear, however, from Gelernter's discussion of having your data spread out over a million machines that he's talking about something different: a peer-to-peer system like Distributed Hash Table (DHT) where your data is sharded over a large number of machines operated by different people. In the limit, you could have a worldwide system where anyone could add their machine to the overlay network and just pick up a share of the data being stored by other people. In principle, this sounds like it removes the risk of a single point of failure, since you would need to compromise all the machines in question. In practice, it's not anywhere near so good, for two reasons. First, you're trusting a whole pile of other people who you don't know not to reveal/misuse whatever part of your data they're storing. That's not very comforting if the data in question is your social security number. So, if you're unlucky enough to have part of your data stored by your enemies, that's not good. Second, DHTs are designed to dynamically rebalance their load as machines join and leave the overlay. This means that it may be possible for an attacker to arrange that his hosts are the ones which get to store your data, which would increase the risk of compromise. Even in DHTs which don't dynamically rebalance, it's generally not practical to manage a distributed access control system across such an open network; instead it's just generally assumed that if you want your data to be confidential you will encrypt it.

This brings us to the suggestion that the data will be sharded in some way that makes each individual piece useless. This seems kind of pointless. First, it's not necessarily easy to have a generic function which breaks a data object into subsets each of which is useless. Gelernter gives the example of a vertical strip of a photo, but consider that a horizontal strip of an image of a document (or a vertical strip of a document in landscape mode) leaks a huge amount of information. I can imagine security arguments for other sharding mechanisms (every Nth byte, for instance), but there are also cases where they're not secure. Second, if you're encrypting the data anyway, then it doesn't matter how you break it up, since any subset is as useful (or useless) as any other.

The bottom line, then, is that cloud storage doesn't necessarily make things as much more secure or simpler as you would like: You still need to deal with encryption and with protecting your own computer. What cloud storage does is remove the need for you to operate and protect your own server. This adds a lot of flexibility (the ability to have your data available whatever machine you're using) without too much additional effort, but it's not much more secure than just carrying the data around on a laptop or USB stick.

One more thing: you don't really want to just shard the data. Say that you break each file up into 100 pieces and the node storing piece #57 crashes and loses your data. What happens? If your file is plain text, it might be recoverable, but with lots of file formats (e.g., XML), this kind of damage can render the entire file unusable without heroic recovery efforts. There are well-known techniques for addressing this situation (see forward error correction), but it's not just a simple matter of splitting the file into multiple parts.

1.Technical note: I'm talking mostly about full services like Box or Amazon S3. Outsourced virtual machine services like EC2 of course require you to manage them and so you can screw them up just as badly as you could screw up a machine in your own rack.

1 Comments

What, no mention of threshhold secret-sharing schemes?

Leave a comment