Recently in SYSSEC Category


August 31, 2011

Today I had occasion to ask Web Security expert Adam Barth about avoiding XSS vulnerabilities. He was kind enough to help me out and (since he's presumably tired of answering this kind of question) also to write up some guidance for the rest of the world. His answer is below:

Folks often ask me how to build web applications without cross-site scripting (XSS) vulnerabilities, but I haven't really been able to find a reference that I'd be happy recommending. There are, of course, many different approaches you can use to build an XSS-free web application. This guide recommends a simplistic approach that works well for "single-document" web applications (like Gmail and Twitter) that use a single HTML document for the lifetime of the application.

Because complexity is the enemy of security, we approach the problem of eliminating XSS by simplifying how we handle untrusted data (by which we mean any information we retrieve from the network or from the DOM):

  1. Untrusted data MUST NOT be transmitted in the same HTTP responses as HTML or JavaScript. In particular, the main HTML document SHOULD be static (and therefore cacheable for a long time).

  2. When transmitted from the server to the client, untrusted data MUST be properly encoded in JSON format and the HTTP response MUST have a Content-Type of application/json.

  3. When introduced into the DOM, untrusted data MUST be introduced using one of the following APIs:
    • Node.textContent
    • document.createTextNode
    • Element.setAttribute (second parameter only)

That's it. If you follow those three rules, you stand a good chance of avoiding XSS. These rules are conservative. You can certainly build a secure web site that violates one or more of these rules. Conversely, these rules don't guarantee success. For example, they don't stop you from doing some dumb things:

// The textContent of a <script> element is active content.
var scriptElement = document.createElement('script');
scriptElement.textContent = userData.firstName;  // XSS!

// Some attributes (mostly event handlers) are active content:
var imageElement = document.createElement('img');
imageElement.setAttribute('onload', userData.lastName);  // XSS!
imageElement.src = '';

However, common sense should help you avoid those situations.


September 27, 2010

First, sorry about the light posting. Nothing's wrong, just an unusually bad case of work being busy plus writer's block.

Like all paranoid people, I run my Mac with screen locking. The other night I decided to lock the screen (I can't remember whether I used the hot corners or closed the lid) and then quickly changed my mind. The result was the state shown below:

Sorry the picture is a bit blurry, but basically it's the password dialog popup but with my desktop behind it rather than the expected black mask of the screen saver.

Obviously, this is a little distressing, so I tried hitting cancel, closing the lid to put the display to sleep, but nothing would restore it to the usual locked (i.e., black screen with just the password dialog condition). The best part, though, is that the screen itself wasn't totally locked—I was able to use the mouse to manipulate the windows hiding behind the model dialog. I wish I had thought to see if the keyboard input was locked into the dialog, but I forgot to check.

Needless to say, I just powered down the machine at this point (paranoid, remember?)


September 11, 2010

David Gelernter writes
Maybe most important, you need a cloud for security. More and more of people's lives is going online. For security and privacy, I need the same sort of serious protection my information gets that my money gets in a bank. If I have money, I'm not going to shove it in a drawer under my bed and protect it with a shotgun or something like that. I'm just going to assume that there are institutions that I can trust, reasonably trustworthy to take care of the money for me. By the same token, I don't want to worry about the issues particularly with machines that are always on, that are always connected to the network, easy to break into. I don't want to manage the security on my machine. I don't want to worry about encryption; I don't want to worry about other techniques to frustrate thieves and spies. If my information is out on the cloud, not only can somebody else worry about encryption and coding it, not only can somebody else worry about barriers and logon protections, but going back to Linda and the idea of parallelism and a network server existing not on one machine, but being spread out on many, I'd like each line of text that I have to be spread out over a thousand computers, let's say, or over a million.

So, if I'm a hacker and I break into one computer, I may be able to read a vertical strip of a document or a photograph, which is meaningless in itself, and I have to break into another 999,999 computers to get the other strips.

This doesn't make a lot of sense to me.

First, most of what you see described now as "cloud"-type services, e.g., EC2 or, are really just big server farms operated by a single vendor. There's a reasonable debate about whether these are more or less secure than services you operate yourself. With something like, you don't need to do any of the admin work on the server, so you can't screw it up and leave your system insecure. On the minus side, you don't really know what the operator is doing, so maybe they're administering it more insecurely than you would yourself. Moreover, there's a certain level of risk from the fact that other people—maybe your enemies—are accessing the same computers as you and may be trying to steal your data. What this kind of cloud service is, mostly, is more convenient: managing your own systems is a huge pain in the ass, and so while in the best case you might manage them more securely than Amazon or Box would, in practice you probably won't1.

What this doesn't do, however, is remove a single point of failure. In fact, there are at least two:

  • Your data is stored at a small number of machines at the service provider site. Compromise of one of those machines will lead to compromise of your data, as will of course compromise of any of their management machines.
  • If the machine on your desk which you use to access the data is compromised, then your data will also be compromised.

You can, of course, remove the risk of compromise from the service provider side by encrypting all your data before storing it. In that case, you're left with the risk of compromise of your own machines but you now have to, as Gelernter says "worry about encryption". There's no real way to completely remove the risk of compromise of your own machines: after all, you need some way to view the data and that means that your machines need to be able to access it. At most you can minimize the risk by appropriate security measures.

It's clear, however, from Gelernter's discussion of having your data spread out over a million machines that he's talking about something different: a peer-to-peer system like Distributed Hash Table (DHT) where your data is sharded over a large number of machines operated by different people. In the limit, you could have a worldwide system where anyone could add their machine to the overlay network and just pick up a share of the data being stored by other people. In principle, this sounds like it removes the risk of a single point of failure, since you would need to compromise all the machines in question. In practice, it's not anywhere near so good, for two reasons. First, you're trusting a whole pile of other people who you don't know not to reveal/misuse whatever part of your data they're storing. That's not very comforting if the data in question is your social security number. So, if you're unlucky enough to have part of your data stored by your enemies, that's not good. Second, DHTs are designed to dynamically rebalance their load as machines join and leave the overlay. This means that it may be possible for an attacker to arrange that his hosts are the ones which get to store your data, which would increase the risk of compromise. Even in DHTs which don't dynamically rebalance, it's generally not practical to manage a distributed access control system across such an open network; instead it's just generally assumed that if you want your data to be confidential you will encrypt it.

This brings us to the suggestion that the data will be sharded in some way that makes each individual piece useless. This seems kind of pointless. First, it's not necessarily easy to have a generic function which breaks a data object into subsets each of which is useless. Gelernter gives the example of a vertical strip of a photo, but consider that a horizontal strip of an image of a document (or a vertical strip of a document in landscape mode) leaks a huge amount of information. I can imagine security arguments for other sharding mechanisms (every Nth byte, for instance), but there are also cases where they're not secure. Second, if you're encrypting the data anyway, then it doesn't matter how you break it up, since any subset is as useful (or useless) as any other.

The bottom line, then, is that cloud storage doesn't necessarily make things as much more secure or simpler as you would like: You still need to deal with encryption and with protecting your own computer. What cloud storage does is remove the need for you to operate and protect your own server. This adds a lot of flexibility (the ability to have your data available whatever machine you're using) without too much additional effort, but it's not much more secure than just carrying the data around on a laptop or USB stick.

One more thing: you don't really want to just shard the data. Say that you break each file up into 100 pieces and the node storing piece #57 crashes and loses your data. What happens? If your file is plain text, it might be recoverable, but with lots of file formats (e.g., XML), this kind of damage can render the entire file unusable without heroic recovery efforts. There are well-known techniques for addressing this situation (see forward error correction), but it's not just a simple matter of splitting the file into multiple parts.

1.Technical note: I'm talking mostly about full services like Box or Amazon S3. Outsourced virtual machine services like EC2 of course require you to manage them and so you can screw them up just as badly as you could screw up a machine in your own rack.


August 29, 2010

The second major attack described by Prasad et al. is a clip-on memory manipulator. The device in question is a small microcrontroller attached to a clip-on connector. You open the control unit, clip the device onto the memory storage chip, rewrite the votes, and disconnect it. There are a number of countermeasures we could consider here.

Physical Countermeasures
We've already discussed seals, but one obvious countermeasure is to encase the entire memory chip in plastic/epoxy. This would make it dramatically harder to access the memory chip. One concern I have about this is heat: were these chips designed to operate without any cooling. That seems like a question that could be experimentally answered. I think you'd want to use transparent epoxy here, to prevent an attacker from drilling in, access the memory chip, and covering it over, maybe with a small piece of plastic to permit future access. I also had an anonymous correspondent suggest encasing the entire unit in epoxy, but at most this would be the circuit board, since the device has buttons and the like; this would of course make the heat problem worse.

Cryptographic Countermeasures
Another possibility would be to extend the cryptographic checksum technique I suggested to deal with the dishonest display. At the end of the election when the totals are recorded the CPU writes a MAC into the memory over all the votes (however recorded) as well as writing a MAC over the totals. It then erases the per-election key from memory (by overwriting it with zeros). This makes post-election rewriting attacks much harder—the attacker would need to also know the per-election key (which requires either insider information or access to the machine between setup and the election) and the per-machine key, which requires extensive physical access. I think it's plausible to argue that the machine can be secured at least during the election and potentially before it. Note that this system could be made much more secure by having a dedicated memory built into the CPU for storage of the per-unit key, but that would involve a lot more reengineering than I'm suggesting here.


August 27, 2010

In their paper on Indian EVMs, Prasad et al. demonstrate that you can easily pry off the LED segment display module and replace it with a malicious display. At a high level, it's important to realize that no computer system can be made secure if the attacker is able to replace arbitrary components, since in the limit he can just swap everything out with lookalike components.

The two basic defenses here to use anti-counterfeiting techniques and to use cryptography with hardware security modules. Most of the proposals for fixing this problem (and the memory overwriting problem) are of the anti-counterfeiting variety; you seal everything up with tamper-evident seals and make it very hard to get/make the seals. Then any attacker who wants to swap components needs to break the seals, which is in theory obvious. Unfortunately, it's very hard to make seals that resist a dedicated attacker. In addition, sealing requires good seal procedures in terms of placing them and checking them with this many machines in the field it's going to be quite hard to actually do that in a substantially better way than we are doing now.

The other main defense is to use cryptography. The idea is that you embed all your sensitive stuff in a hardware security module (one per device). That module has an embedded cryptographic key and is designed so that if anyone tampers with the module it erases the key. When you want to make sure that a device is legitimate, you challenge the module to prove it knows the key. That way, even if an attacker creates a lookalike module, it can't generate the appropriate proof and so the substitution doesn't work. Obviously, this means that anything you need to trust needs to be cryptographically verified (i.e., signed) as well. Periodically one does see the suggestion of rearchitecting DRE-style voting machines to be HSM-based, but this seems like a pretty big change for India, both in terms of hardware and in terms of procedures for managing the keying material, verifying the signatures, etc.

However, there is an intermediate approach which would make a Prasad-style attack substantially harder without anywhere near as much effort. The idea is that each machine would be programmed by the Election Commission of India with a unique cryptographic key. This could be done at the same time as it was programmed for the election to minimize logistical hassle. Then at the same time that the vote totals are read out, the machine also reads out a MAC (checksum) of the results computed using that key. That MAC is reported along with the totals and if it doesn't verify, that machine is investigated. Even though the malicious display can show anything the attacker wants, the attacker cannot compute the MAC and therefore can't generate a valid report of vote totals. The MAC can be quite short, even 4 decimal digits reduces the chance of successful attack on a machine to 1/10000.

This approach is significantly less secure than a real HSM, since an attacker who recovers the key for a given machine can program a display for that machine. But it means that the window of opportunity for that attack is much shorter; if the key is reprogrammed for each election then you need to remount the attack between programming time and election time, instead of attacking the machine once and leaving the display in place indefinitely. It's also worth asking if we could make it harder to recover the key; if it's just in the machine memory, then it's not going to be hard to read out using the same technique that Prasad et al. demonstrate for rewriting vote totals. However, you could make the key harder to read by, for instance, having two keys, one of which is burned into the machine at manufacture time in the unreadable (hard to read) firmware which is already a part of each machine and another which is reprogrammed at each election. The MAC would be computed using both keys. This would require the attacker to attack both the firmware on the machine (once) and the main memory (before each election).

Clearly, this isn't an ideal solution but as I said at the beginning of this series, the idea is to improve things without doing too much violence to the existing design. Other approaches welcome.


May 30, 2010

LaTeX is, of course, the standard document production system for computer science documents (with a tiny minority using {t,n}roff). It's also a good example of one of the standard CS approach of solving problems by inventing a new programming language. Consider that designing a modern Web page involves using three separate languages, HTML, CSS, and JavaScript (of these only JavaScript is obviously Turing complete). As another example, when you print documents, you generate PDF or PostScript, which are just programming languages (PostScript is Turing complete, not sure about PDF)... Anyway, LaTeX is a bit too complete, it turns out.

Steve Checkoway, Hovav Shacham, and I have a paper at LEET describing how a malicious LaTeX file can compromise your computer:

We show that malicious TEX, BIBTEX, and METAPOST files can lead to arbitrary code execution, viral infection, denial of service, and data exfiltration, through the file I/O capabilities exposed by TEX's Turing-complete macro language. This calls into doubt the conventional wisdom view that text-only data formats that do not access the network are likely safe. We build a TEX virus that spreads between documents on the MiKTEX distribution on Windows XP; we demonstrate data exfiltration attacks on web-based LATEX previewer services.

This isn't just an issue of LaTeX files. While people do sometimes run LaTeX files prepared by others, generally those are only files you get from people you know, i.e., your collaborators. But it turns out you can also embed malicious code in BibTeX files, which people routinely copy and paste from totally untrusted sources (the BibTeX entry for this paper is here) in order to simplify reference management. The other major case is LaTeX class files, which people download for conference submission.

The good news is that the main threat is on Windows because LaTeX on UNIX is more restrictive about where you can write files. The bad news is that it's also an issue if you run Emacs (look, another embedded language!) with AucTeX (the best way to edit LaTeX files), AucTeX writes executable cache files in the local directory, so you're at risk.

Happy editing!


May 9, 2010

Henry Farrell over at Crooked Timber reports on having his laptop lost and then recovered. He then goes on to recommend a variety of precautions for future incidents:
Also - in the spirit of locking the barn door after the horse has gone but to your very great surprise been returned later through the benevolence of strangers - recommendations for minimizing the pain of stolen machines.

(1) Back Up Everything Important somewhere external. This is the one measure I did take - and the pain would have been far, far greater had I lost my work along with the machine. I use Sugarsync which keeps the work documents on my various machines in sync with each other as well as giving me an online back up - others swear by DropBox, SpiderOak and other services.

(2) Make sure that your account is password protected. I didn't do this - remarkably stupidly - but appear to have gotten away without loss of personal information. You shouldn't take this risk. I won't again.

(3) Set up a firmware password if you have a recently made Mac. Makes it much harder to wipe the OS.

(4) Consider buying anti-theftware like Undercover. Depending on your tolerance for risk, this may be too expensive for the benefits provided (me: my risk tolerance has decreased substantially since this happened to me).

(1) is of course good advice. Backups are good practice for a variety of threat models, including just plain hardware failure. I personally run backups and also keep most of my important stuff in a revision control (originally CVS but I'm moving over gradually to SVN).

Recommendation (2) is nowhere near strong enough. Passwords (barely) protect you against someone who has ephemeral physical access, but if you don't encrypt the hard drive, then a dedicated attacker can either boot up in repair mode (the BIOS password (#3) makes this more difficult) and read your data off or just pull the hard drive out. What you need here is disk encryption. Luckily, the Mac comes with FileValult: a quite serviceable (if a hair slow) disk encryption system.

Recommendation (4) makes some sense, though I doubt I would bother myself. I've never lost a laptop and when we multiply out the chance of loss times the chance of recovery and factor in the likelihood that your laptop will be covered by homeowner's insurance, I'm not sure that the $50 for Undercover is a good bet.


January 24, 2010

A fair bit has been written about Google's "new approach to China"
Like many other well-known organizations, we face cyber attacks of varying degrees on a regular basis. In mid-December, we detected a highly sophisticated and targeted attack on our corporate infrastructure originating from China that resulted in the theft of intellectual property from Google. However, it soon became clear that what at first appeared to be solely a security incident--albeit a significant one--was something quite different.


Third, as part of this investigation but independent of the attack on Google, we have discovered that the accounts of dozens of U.S.-, China- and Europe-based Gmail users who are advocates of human rights in China appear to have been routinely accessed by third parties. These accounts have not been accessed through any security breach at Google, but most likely via phishing scams or malware placed on the users' computers.


These attacks and the surveillance they have uncovered--combined with the attempts over the past year to further limit free speech on the web--have led us to conclude that we should review the feasibility of our business operations in China. We have decided we are no longer willing to continue censoring our results on, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down, and potentially our offices in China.

I don't really see the connection between this incident and Google's decision to stop offering filtered access to search queries in China, at least in terms of protecting Google from future attacks. Let's say for the sake of argument that not only were the attacks originated in China but also that (and as far as I know, this is unproven), they were directly sponsored by the Chinese government. How does refusing to offer filtered searches help? It's not like the hackers (allegedly) used some vulnerability in the filtering software as their attack vector. Similarly, even if Google were to pull out of China, or even cut off all access to Chinese IP addresses, Chinese hackers aren't restricted to using IP addresses in Chinese address ranges; they can perfectly well use machines which are located in the US, either by using legitimately purchased accounts as stepping stones, or by using compromised American hosts, of which there are plenty.

I don't have any inside information, but it seems to me like a more plausible story (see this Slate article for an alternate view) is that Google thinks the Chinese government is behind these incidents and this is a way of retaliating against China, under the assumption that China would prefer to have some Google than none. I have no idea whether or not this is something China cares about, however. [Mrs. Guesswork observes that another theory is that Google was previously cooperating with China's surveillance efforts and feels like China overstepped their agreement.]

On a different note, it has been fairly widely reported that an IE 0-day was used in the attack, but Bruce Schneier claims that the hackers exploited a Google-created backdoor intended for lawful intercept (though he doesn't provide any sources):

(CNN) -- Google made headlines when it went public with the fact that Chinese hackers had penetrated some of its services, such as Gmail, in a politically motivated attempt at intelligence gathering. The news here isn't that Chinese hackers engage in these activities or that their attempts are technically sophisticated -- we knew that already -- it's that the U.S. government inadvertently aided the hackers.

In order to comply with government search warrants on user data, Google created a backdoor access system into Gmail accounts. This feature is what the Chinese hackers exploited to gain access.

Of course, both of these can be true. Even if Google built a surveillance tool for the purpose of lawful intercept, presumably it wasn't something you could just connect to without authorization, so I would imagine that you would need to do some hacking to get access to it (unless, of course, the password is "1234").


November 20, 2009

Sequoia Voting Systems recently announced that it will be publishing the source code to their Frontier opscan voting system. Reaction in the security community seems generally positive. Here's Ed Felten:
The trend toward publishing election system source code has been building over the last few years. Security experts have long argued that public scrutiny tends to increase security, and is one of the best ways to justify public trust in a system. Independent studies of major voting vendors' source code have found code quality to be disappointing at best, and vendors' all-out resistance to any disclosure has eroded confidence further. Add to this an increasing number of independent open-source voting systems, and secret voting technologies start to look less and less viable, as the public starts insisting that longstanding principles of election transparency be extended to election technology. In short, the time had come for this step.

I'm less sanguine. I'm not saying this is a bad thing necessarily, but I'm not sure it's a good thing either. As always, it's important to consider what threats we're trying to defend against. We need to consider two kinds of vulnerabilities that might be present in the code:

  • Backdoors intentionally introduced by Sequoia or their engineers.
  • Design and or implementation errors accidentally introduced by Sequoia engineers.

A lot of the focus on open voting systems has focused on the first kind of threat (corporations are stealing your votes, etc.) I think there's certainly a credible argument to be made that having to publish the source code does make this form of attack somewhat harder. If people are looking at your code, then you probably can't put a naked backdoor ("if someone types 1111, give them operator control") into the code because that might get caught in a review. On the other hand, it would be a pretty risky move to put that kind of backdoor into a piece of software anyway, since even closed voting source code does get reviewed, both as part of the system certification process and in private reviews like those conducted by Califoria and Ohio. More likely, you'd want to hide your back door so it looked like an accidentally introduced vulnerability, both to make it harder to find and to give you plausible deniability.

This brings us to the second form of vulnerability: those introduced as errors/misfeatures in Sequoia's development process. These aren't necessarily a sign of incompetence; as Steve Bellovin says, "all software has bugs and security software has security relevant bugs." Having access to the source code makes it easier to find those vulnerabilities (though as Checkoway et al. have shown it's quite possible to find exploitable vulnerabilities in voting systems without access to the source code). This of course applies both to attackers and defenders. There's an active debate about whether or not on balance this makes open source inherently more or less secure. I'm not aware of any data which settles the question definitively, but I don't think that anyone in the security community believes that a previously insecure piece of software will suddenly become substantially more secure just because the source is disclosed; there are just too many security vulnerabilities for the sort of low-level uncoordinated review that you get in practice to stamp out. On the other hand, it does provide a pretty clear near-term benefit to attackers, who, after all, just need to find one vulnerability.

Now, that's not what Sequoia is doing. According to their press release, Frontier is an entirely new system which they say has been "developed from the ground up with the full intention of releasing all of the source code to any member of the public who wishes to download it - from computer scientists and election officials to students, security experts and the voting public". This syncs up better with another argument about why openness is important, which is more about incentives: if vendors know their code will be open to scrutiny they will be motivated to be more careful with their designs and implementations. Reviews of Sequoia's previous systems have been pretty negative; it will be interesting to see if the new one is any better. On the other hand, we have the confounding factor that modern standards for what it means for a piece of software to be secure are a lot higher than those which applied when the original SVS system was written, so it will be hard to tell whether it's really openness which provided the value, or just that they started from scratch.

One more thing: suppose that the source code is published and the code is full of problems. What then?


September 28, 2009

My comments can be found here. You may also be interested in ACCURATE's comments which can be found here.