Voting: June 2009 Archives

 

June 28, 2009

The Canadian Press reports that Elections Canada is pushing to move to "Internet Voting". The underlying report is here. More precisely, they want to do online voter registration and explore online voting. This isn't surprising, really. The 2007 Strategic Plan for 2008-2013 includes the following goals:

  • research and monitor technological trials and innovations in other jurisdictions, both in Canada and internationally
  • implement a registration process that allows electors to register in person, by mail, telephone or Internet anytime and anywhere
  • with the prior approval of Parliament, test a secure voting process during a by-election that allows electors to vote by telephone or Internet

The motivation for online registration seems obvious: people expect to be able to do a wide variety of tasks online, and it seems at least plausible that one could build a secure online registration system. (see here for previous comments on this topic.)

The general motivation seems to be the same as in the US—absentee and especially overseas voters have a lousy election experience, with the requirement to get your ballot, fill it out, and then return it all within the usual fairly short window between ballot finalization and the election (based on the report, a 36 day window is common here). Here's the bottom line:

In view of the number of Canadians who are interested in accessing electoral services on-line, our efforts to put e-registration in place and to test e-voting are well aligned to their needs. At the same time, we are aware that many Canadians, and candidates in particular, are still uncertain about electoral services over the Internet, especially when it comes to on-line voting. We will continue our consultations as we move forward with these services, and will ensure that future voter services meet the high standards of integrity and security that Canadians have come to expect from their electoral processes. We will also return to Parliament with recommendations for legislative change that would allow us to fully implement on-line registration.

I don't know how to read Canadian bureaucrat-speak, but this sure looks like Elections Canada thinks that Internet voting is something they should do and that in principle one can get it right, perhaps after some trial and error. I wonder whether they've consulted with any security experts.

 

June 21, 2009

I haven't been closely tracking the discussions over whether the Iranian elections were subject to fraud or not, but I am able to comment on one issue I saw raised: the speed of the results. For instance, here's Karim Sadjadpour (þ "Mrs. Polly" on Obsidian Wings):
Another Iran expert Karim Sadjadpour agreed, saying he believed this was "a stolen election." Watch Sadjadpour explain why election was "stolen"

"There are a lot of signs there were major improprieties. First of all there were 40 million votes cast and just two hours after the polls had closed they announced Ahmadinejad's victory: and these votes are hand counted in Iran...

It's easy to run the numbers here. Hand counting paper ballots via the sort-and-stack method takes about 6 seconds per ballot/contest pair. This means a team (1-3 people, typically) can do about 10 ballots/min or 600 ballots/hour. If we have two hours, then each team can count 1200 ballots, so we need about 33,000 teams. This is a lot of people but isn't totally outside the realm of of possibility. If you started earlier, such as by having ballots returned in mid-day, then you could obviously get away with fewer teams.

Moreover, you don't need to count all the ballots in order to have a high level of confidence in the result—elections are routinely called with only partial counts available. Loosely speaking the more random your sampling strategy, the fewer ballots you need to count in order to have a high level of accuracy: if you just pulled random individual ballots you could probably get away with only a few thousand in an election with a reasonable margin of victory. If you're working through boxes one at a time but choosing boxes randomly, then you need more samples, and if the boxes are being counted in order then you may need to count a lot more. I have no idea what strategy Iran used, but if I wanted to run a central count system and have accurate estimates of who had won in short order it wouldn't be that difficult.

 

June 20, 2009

William Kelleher has been publicizing a paper entitled "Internet Voting: The Great Security Scare". Here's his thesis:
This paper will present a social science paradigm for critically evaluating the security concerns most often expressed by opponents of Internet voting. In 2003, these concerns were so effectively expressed that they resulted in the US government ceasing all efforts to even experiment with voting from overseas via the new technology. However, when examined within a context of social scientific reasoning, the arguments that stopped the progress of Internet voting in the US appear as mere appeals to fear, bereft of rationality.

First, the problem of how to think about e-crime in general will be discussed. Secondly, the framework that emerges from that discussion will be applied to the arguments against Internet voting. The conclusion will suggest that Internet voting can be conducted with a degree of security similar to an online purchase, a million dollar bank transfer, or a secret military communication. As shown in the essay, the technology already exists, and has been honed over many years of use. While there are differences between the military uses of the Internet, e-commerce, and Internet voting, this paper will argue that the degree of security for each need not vary significantly.

Ordinarily, I don't bother to engage with this weak an argument, but Dr. Kelleher is starting to get some publicity and so I thought it was worth giving it a read. I didn't find it very convincing, but rather than make a point by point rebuttal, I want to focus on what I think is generally agreed to be the most serious obstacle to any Internet voting system: the security of voter's computers. As I've mentioned before, malware on the user's computer has the opportunity to totally compromise his vote, and writing that kind of malware isn't particularly difficult.

Kelleher's discussion of these issues is mostly framed as a rebuttal to the SERVE report, which discussed the security of a particular proposed Internet voting system. The SERVE authors were quite concerned about this sort of malware and discussed a number of vectors by which it might get on a user's computer, including backdoors on installed system software, viruses/worms, and booby-trapped websites. Kelleher's paper takes each of these on in sequence. Let's take a look at just one example, the material about backdoors (the dismissive tone is fairly typical).

This statement suggests that a company favoring Republicans could program all the computers with its software to vote Republican, even if the voter thinks he or she is voting for a Democrat or Independent. A Socialist-favoring company could jigger the vote its way, etc.

Here, the Super Sleuths think they have uncovered the potential for a massive conspiracy by software makers to control US elections by selling people loaded software. But, what would be necessary to carry out such a scheme, and what risks would the perpetrators be taking? First, such a scheme could only affect an Internet-based election in our country if the company sells tens of thousands of loaded product. But, the more they sell, the greater the risk of being caught. Someone who is wary of just such a scheme, whether a citizen computer scientist, or a law enforcement official, is going to examine the code in every type of voting-related product and discover the trick.

Once caught, the executives, and all who conspired with them, risk having to pay huge fines, being sentenced to prison, and losing their livelihood. After they do their time, no software company would hire them, because customers would become suspicious of the company's product. These convicted felons would be lucky to find jobs as taxi drivers, or doormen. How many people who are intelligent enough to run a software manufacturing business are going to be stupid enough to risk these consequences in the forlorn hope of changing some votes to favor their own political party or candidate?

First, this seriously misrepresents the SERVE report, which doesn't at all contemplate that the company would insert the backdoor. In fact, it quite clearly implies the contrary: "Today's computers come loaded with software developed by many different entities; any employee at any of those companies could conceivably leave a backdoor that attacks SERVE." Modern software is generally developed by large teams and controls on the code that authors check in is generally fairly lax. In many environments it wouldn't be at all difficult for an attacker to inject arbitrary code without being detected, especailly if they made some effort to hide it. (More on this in a bit.)

Second, the suggestion that outsiders would actually review all the code in your average computer and detect this kind of attack is, let's say, extremely problematic. As the SERVE authors correctly observe, any software on your machine could have a backdoor in it, not just the voting software. This means that on your average machine you need to audit all of Windows, Office, IE, Firefox, etc. Estimates I've heard of the number of lines of code in Windows alone run into the tens of millions, and the cooldifficult thing about a backdoor is that it can be anywhere in the code base. Auditing this sort of system is a massive (read: totally impractical) project. When we did the Top-To-Bottom Review, we had our hands full just looking for unintentional vulnerabilities in a code base about 1% as large, we didn't even try to look for backdoors. And of course, we had the advantage of the source code, which vendors generally regard as pretty secret. Your average reviewer is not going to have the source code for Windows.

Moreover, there's no reason to inject something as obvious as a program to change votes. All you need is a remotely exploitable vulnerability known only to you, and this can be as simple as a missing array bounds check, integer overflow check, etc. Then when you're ready to install your malware you exploit the vulnerability and there doesn't need to be any voting specific code to find at all. This has two advantages: (1) it's hard to find because it's a small error and (2) even if you get caught you can plead incompetence. Given the number of vulnerabilities found in your average program, it's extremely improbable that you would suffer any consequences—certainly none of the existing voting vendors have been arrested for vulnerabilities found in their systems. [To be totally fair, the SERVE report doesn't make this point quite as clearly as one might like]

I don't propose to go through the rest of the paper point by point. Suffice to say that overall it betrays a fairly shallow understanding of the state of computer security and mostly depends on the ever-popular "argument from incredulity". In particular, Kelleher is incredulous that there could ever be widely deployed malware that infects a large number of computers. As it happens, however, not only is this possible, we already have several worked examples in the form of large botnets. It's not hard to envision repurposing that sort of software to mount attacks on voting systems. Actually it's in some respects easier because you don't need any command and control, you just deliver the attack payload; it waits till election day and then activates. Far from being incredible, then, this attack seems fairly practical.

 

June 15, 2009

Experience with recent contested close elections, in particular the US Presidential Election of 2000 and the Minnesota Senatorial Election of 2008, suggests that the US system doesn't do that great a job of handling elections where the margin of victory is very small. There are at least two problems here:
  • It's not clear that we really have efficient techniques for counting millions of ballots to four nines of accuracy (the difference between Coleman and Franken currently stands at 312 votes out of 2.8 million). [*]).
  • Even if we assume we could count an agreed upon set of ballots accurately, there's significant enough agreement about which ballots should be counted that this can become the deciding factor in close races.

The bottom line, then, is that it's very difficult to get a consensus count in a close race, especially as the apparently losing side has a real incentive to muddy the water as much as possible. What we're left with, then, is a lot of fighting about individual votes and no agreed upon count that reflects a specific winner, allowing partisans of each side to insist that if only the votes were counted correctly their candidate would have won.

Coin-Flipping
This isn't a new observation and I've more than once heard suggestions that since there's no way to uncontroversially choose a winner in close races based purely on the vote count, we should resolve such races using an element of randomness, for instance by flipping a coin. (Charles Seife's op-ed in the NYT is one example, as is Michael Pitts's SSRN paper). [Note that some states, and Minnesota in particular, already have a provision in election law for coin flips to resolve exact ties.] However, it should be immediately obvious that this just moves the problem: whatever margin of error you choose to trigger the election now becomes the litigation point. Pitt suggests a solution to this (require any challenges to show there is a real chance of prevailing in the entire election, not just pushing you over the threshold), but as Nathan Cemenska observes, you can just contest even earlier to prevent the preliminary count from creating facts on the ground. This seems like a problem with any scheme with a sharp threshold.

A second problem with proposals like Pitts's is the use of an unbiased coin. Say we have an election where there were 10,000,000 votes and Jefferson appears to have beaten Burr by 3,000 (.03%). Now, this may not be enough to convince you that Jefferson won, but it's certainly evidence that Jefferson is more likely to have won. To give you a feel for the situation, if we treat this as a sampling problem with a sample size of 10,000,000, the 90% confidence interval around the estimate is +/- .025%--so we're outside the margin of error. Any random selection procedure should somehow reflect this information, not just act like we don't know anything about the vote counts.

The bottom line, then, is that I don't think that the coin idea holds up.

A better procedure?
Instead of flipping a coin, the natural thing to do is to treat the nominal election results as an estimator of the true results and then do a random sample that is appropriately biased in favor of the nominal winner. This potentially solves both problems at once:

  • It favors the nominal winner so that it takes into account available information about the vote count.
  • It doesn't have a sharp threshold effect so there is less incentive to litigate at the margin.

However this depends on having an appropriate randomized algorithm.

One natural approach is to treat this is a statistical inference problem. I.e., we treat the election as a very large poll with the the nominal election results being a random sample out of a universe of many more voters (or many more elections). What we are interested in is the probability R that the "true" number of voters who prefer the nominal winner is really greater than those who prefer the nominal loser (this will always be greater than .5, since otherwise the nominal winner would be the nominal loser). This is much the same as computing the margin of error of a poll, for which we have good statistical techniques. So, we just compute R and then flip a biased coin (really you'd use some better technique) which has chance R of coming up for the nominal winner and chance 1-R of coming up for the nominal loser.

Technical note: the way to think of this is that we're sampling out of a binomial process with unknown probability P, which corresponds to the fraction of voters who prefer the nominal winner. We draw a sample of size N (the number of voters) and get pN votes for the nominal winner and (1-p)N for the nominal loser. (The treatment can be readily extended to third party candidates and voting errors). So, p is our estimate of P and the distribution of results is just the binomial distribution. With a large enough N (and a hundred or so is plenty large), we can use the normal approximation with the variance of P being the familiar Np(1-p). It's easy enough to determine what fraction of the distribution of lies to the left of .5, and hence the probability (speaking loosely here) that our election results are wrong.

This statistical inference treatment is fairly unsatisfactory philosophically: we're not sampling out of some infinite universe of elections and the number of voters isn't very large compared to our sample size. On the contrary, we have a single election that captures almost everyone in the relevant region but has some error around the margin. We could try to salvage this by inverting the question and asking "if we treat these results as the true proportions" but ran a new election, what's the probability that the result would be different? This is just more binomial distribution and gives us the same formula, however, so we're just cleaning up the philosophical groundwork a bit and it still isn't very satisfactory. It doesn't get at the core problem which is the underlying assumption that whatever process we see that causes some votes to be included as part of our sample and others not to be is random. It's easy to imagine this not being true: what if the ballots in precincts that go Democratic are for some reason harder to vote correctly than those that go Republican? This sort of error is very hard to account for in some clearly fair systematic way.

The figure below should give you a feel for the amount of uncertainty introduced by this method.

It's important to note here that the larger the election the lower the probability that an election with a given apparent margin of victory (represented as a fraction of the number of votes) will be reversed. This is a natural consequence of the statistics of sampling, where the sample mean more tightly approaches the true mean as sample size increases.

How to generate the random numbers?
One question you might want to ask is: if we're not going to use a coin, how do we generate the random numbers in some verifiably fair way. This is actually fairly easy: we can use ping pong ball style lottery machines where we label the appropriate fraction of ballots for each candidate. This can be done arbitrarily or (if the candidates don't trust the assignment), we can alternate ball selection so that each candidate can choose the balls he likes best.

Is this good enough?
As I said above, there are some philosophical problems here. The statistical model we're using only imperfectly (at best) reflects the situation. We know our data wasn't derived from a random sampling process and worse yet we're discarding all sorts of information (polling, the disposition so far of challenged ballots, etc.) that could tell us something if we knew how to incorporate it in a fair way. However, what we're getting for that is a straightforward method that is easy to apply in a systematic fashion.

We also need to ask whether this method reduces the threshold effect I mentioned above. As a concrete example, let's take the Minnesota 2008 Senate Race. If we pretend that Coleman and Franken were the only candidates in the Minnesota election, we would give Coleman about a 42% chance of prevailing. If Coleman could remove 100 of Franken's votes, this would up his chances to 44.6%. Is that enough to make it worth litigating those 100 votes? I don't know. Obviously, we could make the distribution wider by some ad hoc factor (though I don't know of any principled way to do so), but that seems unfair to the nominal winner, since we're basically choosing to ignore evidence in his favor.

The bottom line, then, is that this really isn't that great a method: I just don't have a better one.

UPDATE: Minor grammatical fixes due to Steve Checkoway.

 

June 6, 2009

In the comments section, Dan Weber points out that remote voting systems are in general fairly susceptible to coercion and vote buying attacks. For concreteness, let's say we're dealing with one of the systems in which you fill out your (anonymous) ballot and then stuff it in an envelope with your name and signature on it. In order to sell (No way am I going to write "or appease your coercer" after every instance of sell. Just take it as read.) your vote you wait until you have received your ballot, then sign the envelope and give both the ballot and the envelope to the person buying your vote. They fill it out as they like and mail it in at their convenience. [Note that one could make this attack very slightly harder by requiring people to sign over the envelope flap so that the envelope had to be sealed before you signed, but even then the attacker could just fill in the ballot and then have you sign the sealed envelope.]

This is sort of a generic problem with all remote voting schemes, at least the non-cryptographic ones. The only real defense I know of is to permit voters to cast multiple ballots with only the last one counting. The idea here is that I would give the attacker my vote by mail ballot and then cast a "real" vote later, either in person or remotely [note that this would require treating ballots as uncontrolled items, which, as I said, isn't always true.] This doesn't seem like an entirely satisfactory defense for a number of reasons. First, it depends on a large fraction of people selling their votes deliberately trying to cheat the buyer. In practice, it seems likely that many won't, in which case the attacker will have a pretty high success rate. Second, if the attacker has any way of knowing whether people voted multiple times—even without knowing how they voted—then they have an opportunity to punish defectors (or at least not pay them). It probably only takes a small probability of that happening to deter defection.

Now, it could be that the convenience of vote-by-mail is worth enabling this kind of attack. That's a policy cost/benefit type question, but from a technical perspective I'm not sure how to remove this attack with conventional (i.e., non-cryptographic) systems, so it's something that needs to be considered when deciding how much VBM to have.

 

June 4, 2009

Kevin Poulsen writes about the Arizona Internet overseas voting system:
"It's run over a secured system using industry standard encryption," said state CIO Craig Stender. "We had many users from over 50 countries using the system in that election."

...

In the Arizona system, voters could request an early ballot through a Secretary of State website, and receive it though snail mail. If there's no time for the postal service, though, the voter gets a PDF of the ballot in e-mail.

This is where it gets a little clunky. You can't fill out the ballot on your computer - you have to print it out, then use your scanner to scan the completed and signed ballot back onto your PC. Then you upload the scanned ballot to the aforementioned "secured system" (it uses SSL).

From there, county election officials can log on and retrieve the ballot through a pretty nifty backend system. They print it out in your home county, and treat the printout like any other absentee ballot. The whole system allows an overseas voter to request a ballot and vote as late as 7:00 p.m. on election day, without planning ahead, and the state credits it for an unspecified increase in overseas voter participation in 2008 (of course, participation increased across the board in 2008).

Poulsen talks a bit about the security issues here, but I thought I'd elaborate some.

The first thing to realize here is that this is actually the combination of two distinct systems, each with it's own security issues. First, we have an Internet-based blank ballot distribution system. Second, we have an Internet-based completed ballot return system. Just because you're distributing ballots over the Internet doesn't mean that you need to have them returned that way. You could just as well have people mail or fax them back. Conversely, you could mail out ballots and then have people return them over the Internet. It's easiest to analyze these systems separately.

Internet Ballot Distribution
Delivering blank ballots over the Internet seems like a natural optimization: ballot distribution by paper mail is slow and replacing that with Internet distribution improves round trip time by a factor of two, which seems valuable—as it is, the ballots already need to be designed and mailed long in advance of the election. Moreover, the security problems with distributing blank ballots are (comparatively) simple: we just need to ensure that each voter gets the correct ballot, but they're not secret so we don't need to worry about confidentiality, etc.

That isn't to say that there are no security issues, though. First, we need to ensure that an attacker can't substitute a fake ballot. In the simplest case, the attacker could just produce a superficially valid ballot that wasn't readable by the central optical scanners. That might be handled properly by the county, but it might just be discarded. More interestingly, the attacker might generate a ballot which would scan correctly, but where the text on the ballot doesn't match the votes that are recorded by the scanner for the corresponding regions (e.g., switch the labels for Bush and Gore). One might be able to program the optical scanner to detect this kind of attack by comparing the returned ballot to the expected ballot format, but I suspect that many scanners just look for the voting regions and ignore the text. After all, this is the simple way to build the system. An attack of this type could be mounted by attacking the distribution server, intercepting/modifying the ballot in transit, or attacking the user's PC once the ballot is delivered.

Finally, treating a ballot printed on ordinary printer paper as a valid ballot is a major shift from ordinary practice. In Santa Clara County, the ballots are printed on special paper and the pollworkers are expected to take action to control the number of ballots in circulation. For instance, when an absentee voter shows up and wants to vote in the polling place, they need to surrender their ballots—to prove they didn't vote by mail already—or they have to vote a provisional ballot. Similarly, election officials and/or pollworkers do reconciliation of the number of ballots voted against the number of voters. Both of these depend on controlling the total number of ballots in circulation. By contrast, if you send people PDFs they can print as many ballots as they want, and suddenly they're not a controlled item. I realize that this isn't a great form of security, since it's not really that hard to get your own ballots printed, but nevertheless it's part of the security model for the election.

Internet Ballot Return
The second piece of the system is returning the completed ballots. We have integrity issues here as well: as Poulsen suggests (and quotes Rubin as suggesting), there are a number of ways for things to go wrong here: an attacker could subvert your computer and have it modify the ballots before sending them; you could get phished and the phisher could modify your ballot appropriately before passing it on to the central site. Finally, the attacker could subvert the central server and modify the ballots before they are printed out. Poulsen quotes an election official arguing that this sort of modification is difficult:

"It's not true internet voting, so we don't feel that we have the same security issues that true internet voting would have," said Bjelland. She adds that Arizona has some 5,000 different ballot layouts for different voting jurisdictions, which would make automated tampering a challenge.

This actually doesn't seem like that hard a problem for the attacker: the ballot styles are fairly stereotyped and it's just a matter of OCRing the ballot enough to figure out which markable region corresponds to which label—and this assumes that you don't have a copy of each ballot style and can't just write semi-custom software.

Another issue is that this changes the semantics of the absentee ballot: in many jurisdictions, you fill out the ballot and place it an envelope and put your personal information and signature on the envelope. This allows the election officials to determine whether the ballot is valid without knowing its contents, providing a check against bias. It also provides for a measure of voter privacy because the voter's identification isn't tied to the ballot once the envelope's provenance has been verified. If voters sign the ballots rather than the envelopes, then both of these properties are removed.

It should be clear at this point that this sort of system isn't totally without risk. I'd be interested to hear what security issues Bjelland feels "true internet voting" has that this system does not.

 

June 2, 2009

The Electronic Voting Technology/Workshop on Trustworthy Elections 2009 (August 10-11 2009) workshop program is up.

There are a bunch of interesting papers (I was on the PC so I've already read quite a few of them) and this year I have three--you can form your own opinion about whether they're interesting:

  • Understanding the Security Properties of Ballot-Based Verification Techniques (draft).
  • Weight, Weight, Don't Tell Me: Using Scales to Select Ballots for Auditing (with Cynthia Sturton and David Wagner)
  • On the Security of Election Audits with Low Entropy Randomness (draft).

I'm also coordinating a rump session on Monday night, loosely modelled on the famous CRYPTO rump sessions. Acceptable topics include:

  • Reports on work in progress.
  • Less serious reports (i.e., humor)
  • SHORT Announcements
  • Other stuff the community needs to hear about

Talks will typically be between 3-7 minutes (at the discretion of the chair in order to fit the time constraints). I'll be posting some sort of submission thingamajiggy soonish, but for now if you want to talk you can email me at ekr@rtfm.com. Alternately, just start thinking about what you want to talk about and hold it till something more formal is available.