EKR: July 2008 Archives


July 31, 2008

If you haven't seen this pair of videos by a law professor and a police officer on the topic of why, if you're under suspicion for some crime, you shouldn't talk to the police, you should check it out. The basic concept is that even if you're innocent (and it's likely you're guilty of something) if what you say doesn't directly incriminate you, it is likely to be interpreted as incriminating, or contain inconsistencies which would potentially seem incriminating. The police officer also gives some useful background on the various techniques he uses to get suspects to talk. Practice saying "I want my lawyer."

July 30, 2008

You can find my slides from EVT 2008 here.

July 29, 2008

One problem that seems to come up a lot in computer security is remotely verifying what software is running on a given computer. There are a number of contexts in which this is important, for instance:

  • Verifying that voting machines are running the correct software.
  • Allowing content providers to verify that customers are running software with the right DRM.
  • Preventing the use of cheat software in MMORPGs. [*].

Obviously, this is easy if you're not worried that the software is malicious, but if you have to worry about malice, it's well known in the computer science community that outside of some special cases (basically, having trusted hardware on the target machine), this problem is basically insoluble. The reasons why are interesting and involve some fundamental principles about computer science, but often aren't obvious to laypeople.

Let's start with a simple and obviously wrong approach—though one that would work fine if you're not worried about malice—have a protocol where the verifier can ask the target what software it's running. So, the verifier would say "What software are you running?" and the target would say "FooSoft 1.1." Clearly, if the target is running malicious software, that software can easily lie and return any value of its choice, including the correct value, so this test doesn't really work.

Another approach that people often think of is to ask the target for a copy of the software its running. The verifier can then compare the returned date to a locally stored reference copy [aside for crypto people: it's sufficient to store a digest of the reference copy] and if they match, then the target is running the software. This technique is actually used in some systems; for instance the Hart Voting system (see [IRSW08]; Issue 11). However, it's equally clearly flawed. Just as in the previous approach the malicious target software could return any version string, in this case the malicious software would return a much longer string, consisting of a copy of the correct software. So, for instance, a virus which wanted to hide from verification would first make a copy of the real software somewhere else on the disk and then modify the operating system (or whatever) to install itself.

So, clearly malicious software can return any static value, so this is useless as an integrity check. What about a dynamic check. For instance, the verifier could run a local copy of the software and verify that the remote target reacts the same as the local copy to a variety of inputs. This would presumably catch some kinds of malware, but a smart piece of malware could emulate the correct software and return the correct answers. Now, you might say that it's really hard to write a piece of malware that would perfectly emulate the right software, but you don't have to. What you can do instead is just run the real software in an emulator or virtual machine. This depends on a fundamental fact about computers: within some constraints, any computer can emulate any other computer. I.e., it can arrange that given the same program and inputs, it can produce the same outputs. [for technical people: any Universal Turing Machine can emulate any other Universal Turing Machine.]

Having said that, I'm misrepresenting a little bit. The systems for which this is literally true are Turing Machines, which are a mathematical model of a computer, with an infinite amount of memory. If you're working with a real world machine, which has a fixed resources, then there are situations in which emulation is imperfect. For instance:

  • Virtual machines and emulators aren't as fast as running hardware on the bare metal. So, for instance, if you knew what hardware the target machine had, and you had a very high resolution timer, you might be able distinguish a real machine from an emulator, VM, or patched version of the target software (see [SPDK04].) In practice, these conditions very rarely apply, and even when they do, they're often imperfect. For instance, one might be able to couple the emulator with an optimizer which improved the performance of the program under test so that the net system was as fact as it was on the bare metal [technical note 1: [SPDK04] argue that they can optimize the check code to the point where there's no room for optimization. It's not clear this is true in general, especially on a complex architecture like x86]. [technical note 2: it might actually be faster, in which case you could use timing loops to get matching timing.]
  • Another potential approach is to exploit the fact that computers have a limited amount of memory. For instance, you could arrange that the target computer had just enough memory (or disk) that it could store the target program. Since the emulator takes up space, running the emulator plus the target won't fit in memory so perhaps the system won't work, or will require the use of virtual memory (causing swapping and timing errors). The problem here for the verifier is that most pieces of software are not this finely tuned to the target hardware—this isn't at all convenient to do, and requires work for each new release of the software and hardware. Moreover, we can again use optimization techniques: the emulator can compress the target software, leaving some memory for itself to run in.

The bottom line, then is that it's extraordinarily hard to verify the software running on a computer. The only techniques that work reliably require unmediated access to the system hardware, either locally (e.g., by extracting the hard drive) or by some trusted agent attached to the target machine and which can cryptographically attest to the verifier about the contents of memory. In settings where neither of these are available, we don't know of any solutions to this problem, and as far as I know, most computer scientists believe none are possible.


July 26, 2008

I flew from SFO to LAX today, and noticed again a phenomenon that has annoyed me before: the doors on the bathroom stalls on both airports open inward (evidence below).

Ordinarily, this isn't a big deal, but in the airport it is. There you are with your bag. You walk into the stall, and then you somehow have to close the door behind you, but since it has to clear your bag you need to cram up against the toilet to let the door clear. If the doors just opened outward, this wouldn't be a problem. In SFO, at least, this would be no problem; the aisle is at least 6 feet wide. Even the LAX aisle is wide enough, though it might be a bit cramped to walk through with the door open if you were really fat. Still, this seems like it would be a simple improvement.

Barry Leiba and Jarrod observe that doctors routinely use dictation plus transcription:

Oh boy "And Doctors." A Forensic Pathologist I know spends at least 15 hours a week outside of the office dictating cases. Especially important ones she types herself, but it takes much, much more of her time than recording a dictation, emailing the .mp3 to the transcriptionist, and then reading and correcting the report for errors before sending it out. Typing speed isn't really a limiting factor either, as she does about 80wpm.

I certainly understand why one would want to dictate material if you were doing something else with your hands at the time (cf. TV shows where you see a coroner dictate while performing an autopsy). But I must say it's not apparent to me why you would dictate instead of typing if you were a good typist. I've tried dictating material to authors and always found it much easier to just type it in myself. Is this just a skill you have to practice? Is there something special about doctors?


July 25, 2008

This Science article reports on a counterintuitive result: extensive fire suppression reduces, rather than increases the amount of carbon captured by trees.
Lightning-caused fires serve a natural mechanism within forests. They destroy small trees and underbrush while often allowing large trees to remain standing and flourish. But since roughly 1910, U.S. forest managers have sought to fight as many small forest fires as possible. That policy has allowed more shrubs and small trees to grow than in the past. The increasing quantity of vegetation, scientists calculated recently using tree measurements and other data, sucks 50 million metric tons of carbon dioxide out of the atmosphere each year--roughly 14% of the total amount of carbon pulled in by U.S. forests. However, historical data on tree sizes weren't available to allow scientists to confirm that the forests had absorbed that much carbon over the past century.

To do that, ecologist Michael Goulden of the University of California, Irvine, and a grad student used previously overlooked forest inventory measurements taken in the 1930s on 269 California forest wilderness plots. They then compared these data with measurements taken in the 1990s on 260 plots in the same general vicinity. The number of trees per hectare across all plots rose by 4% in 60 years, an increase the scientists attributed to the federal policy on suppressing fires. Yet the total amount of carbon held by trees declined by 34% over the same period, the researchers report this week in Geophysical Research Letters.

The scientists conclude that the large trees in the plots had to compete with the growing population of small trees, making the big trees more susceptible to drought, wind, and insect attack than they would have been without the crowding. Because the large trees died, they didn't absorb as much carbon dioxide. "It's counterintuitive," says Goulden

I've heard arguments before that fire suppression is bad policy because it blocks the thinning effect of occasional fires, with the result that wildfires become more serious. [*] [Note that I'm not saying that's the cause of this year's severe wildfires. According to the fireman I talked to on Wednesday, the vegetation is especially dry this year—as dry now as it usually is at the end of the summer.] It's interesting to ask whether there's some optimal, nonzero, amount of fire suppression, or whether it would be best to just let fires burn except where they actually threaten human activity. Unfortunately, this is a topic I know basically nothing about.


July 23, 2008

From Arthur C. Clarke's A Fall of Moondust (1961):
When he had finished dictating, he paused ot marshal his ideas, could think of nothing further, and added: "Copy to Chief Administrator, Moon: Chief Engineer, Farside; Supervisor, Traffic Control; Tourist Commissioner; Central Filing, Classify as Confidential."

He pressed the transcription key. Within twenty seconds, all twelve pages of his report, impeccably typed and punctuated, with several grammatical slips corrected, had emerged from the office telefax. He scanned it rapidly, in case the electrosecretary had made mistakes. She did this occasionally (all electrosecs were "she"), especially during rush periods when she might be taking dication from a dozen sources at once. In any event, no wholly sane machine could cope with all the eccentricities of a language like English, and every wise executive checked his final draft before he sent it out. Many were the hilarious disasters that had overtaken those who had left it all to electronics.

This is one of those predictions that's sort of right and sort of wrong. While practically nobody dictates letters any more, it's certainly true that you can't trust computer's attempts to interpret ambiguous human input, as anyone who has tried to use a voice recognition system, typed on an iPhone can attest, or carelessly accepted the suggestions of their spell checker can attest. It's not usually an artifact of excess load, though: computer performance doesn't usually degrade that way. Of course, a modern system of this type would most likely run on a local computer, rather than some remote centralized timesharing system that faxes you your output, but this was a common blind spot of science fiction writers prior to the personal computer era.

More off-base is the assumption that dictation plus transcription (whether manual or automatic) is a good way to write. It's true that people wrote letters that way back in 1961, but practically nobody does that now. This isn't because computer voice recognition systems suck (though they do)—plenty of people could afford to have a full time secretary type their messages—it's just vastly more convenient to use a modern word processing system than it is to dictate, even to a secretary. Pretty much only older people who never learned to type or use a computer need to dictate any more. I'm skeptical even a much better voice recognition system would be good enough to displace typed interfaces to word processing systems. Now maybe if you could use a Cerebrum Communicator... That said, the IBM Selectric was introduced in 1961, so I think Clark can be forgiven for failing to predict how convenient typewriter-style interfaces would eventually be.


July 21, 2008

Gizmodo covers the fully buzzword compliant CherryPal PC. The general idea is that it's a lightweight home device that relies on "cloud computing" for its backend computational operations. Some surprising security claims are also being made. Here's what's on Gizmodo (in what looks like a press release):
CherryPal is the only company that provides a patent-pending combination of both hardware and software encryption, making it highly secure. The CherryPal also offers a patent-pending single software layer technology. This collapses the operating system and browser into one layer, where there had traditionally been three separate layers. It makes the computer exponentially faster and virtually eliminates any risk of bugs or viruses for the user.

There appear to be two claims here. First, they are using "a combination of both hardware and software encryption", which makes things really secure. Now, it's true that there are settings in which hardware encryption is more secure than software encryption, but it's hard to see why they apply here. The major advantage of hardware encryption is that you can build hardware which makes the keying material inaccessible even if you have control of the device. So, for instance, if your device is remotely compromised the attacker wouldn't be able to steal the keys. As I said, there are situations where this is important, but it's not clear that this one; if your machine is remotely compromised, you're probably going to want to completely wipe it, and it's not really that hard to replace the crypto keys as well. Moreover, it's not clear from this material that they even are using hardware-based key isolation.

The second claim is this thing about the "patent-pending single software layer". I'm not sure what this means either. I usually think of the operating system and browser as two layers, so I'm not sure what the third layer is. It sounds like the claim is that the browser is running directly on the metal, which isn't impossible, but it's pretty unclear what the advantage is. One of the major features of modern systems is precisely that they separate the OS from the applications; this allows the OS to enforce policies on the application, as well as to contain compromise of the application (though of course you still have to worry about privilege escalation attacks.) I'm not aware of any security theory that indicates that it's more secure to have only one software component. While we're on the topic, since this sort of monolithic design is the way that systems used to work, it's not clear what's patentable here. (A little searching didn't turn up the patents, but if someone points me to them, I'll take a look.)

Oh, this is good too:

CherryPal is also the first company since Apple Computers to use a Power Architecture-based processor in a personal computer by employing the Freescale MPC5121e mobileGT processor. This chip allows for built-in graphics and audio processing, all while consuming only 400 MHz of power.

400 MHz isn't really that usual a unit of power. Am I supposed to multiply by Planck's constant here?


July 18, 2008

Mobile Edge has rolled out their line of "TSA Compliant" carry-ons. The idea here is that you can open them up for the X-Ray machine and they're easy to scan without requiring you remove your laptop. They're also hideous, as you can see for yourself:

The TSA recently announced plans to implement new security procedures that will allow travelers to pass through security checkpoints without having to remove their laptops from their cases. Simultaneously, the TSA issued a request to laptop bag manufacturers to create "Checkpoint Friendly" laptop bags to help speed up security lines, allowing passengers to get to their departure gates in a timely manner. These new cases will help shorten wait times for more than 250 million passengers that travel annually in the U.S.

The design team at Mobile Edge quickly responded to the TSA request and came up with three new innovative case designs. This new ScanFastâ„¢ Collection consists of a backpack, a briefcase and a messenger bag, all designed to conveniently open for airport screeners and help speed travelers through the X-Ray screening process.

OK, I'm skeptical of this on several levels. First, it's not clear to me that this is going to improve throughput even with constant levels of input. You start in the queue and then move to the tables where you unpack your bags into the bins. The bins and your bags go onto the conveyer belt and get run through the X-ray machine. On the other end, you repack your bags.

There are two ways in which these bags could potentially increase throughput of the system (1) make your bag easier to unpack/pack (2) make all your stuff go through the conveyer faster. It does look to me like these bags could make it easier to unpack/pack your bag, since you don't have to take your laptop out of the bag, just unzip it and flip it open. But this is only a bottleneck (to the extent to which this is a bottleneck) because of insufficient parallelism and pipelining. If you just make the tables longer or have two sets of parallel tables, then people will be able to unpack (and repack) their bags arbitrarily fast. Obviously, this would require some rearrangement of the security area, but it's a lot simpler than introducing entirely new bags.

In my experience, however, the bottleneck is the x-ray machine itself. But it's not clear to me why this would make the x-ray process any faster. Given that the bags open, they presumably take up as much surface area on the belt as the combination of your bag and your laptop, so to the extent to which the linear speed of the scanning process is constant, I don't see why this would make things any faster. Now, it's possible that this allows the scanners to run the belt faster, but given that right now your laptop is on the belt and uncluttered and with this bag it will be partly obscured by the bag, this would presumably make the feed rate required to get an equally thorough scan longer, not shorter.

Given the above, it's not clear to me why these bags would make the screening process faster (or at least that it couldn't be made equally fast with less investment). Even if that somehow were the case, it's not clear that that would make the screening process faster overall; we could easily make the screening process faster by buying more x-ray machines and hiring more screeners, so the current rate reflects some sort of crude cost/benefit analysis. If you could suddenly scan at twice the rate with the same number of screeners, wouldn't you expect the airports to suddenly sharply reduce the number of screeners?


July 17, 2008

Earlier I was downloading the AIM App for my iPhone (on the iPhone, not with iTunes) and it kept prompting me for my password, but then kept rejecting it. Turns out that what it wanted was my iTunes Music Store. Once I got that entered, it downloaded the app and then prompted me for my AIM password and everything worked fine. Just FYI.

July 15, 2008

I'm a man of simple tastes in coffee, but if I were one of those people who liked their drinks a little more frou-frou, I'd need to stay away from Murky Coffee in Arlington, VA. Here's Jeff Simmermon's tale of being denied an iced espresso:
I just ordered my usual summertime pick-me-up: a triple shot of espresso dumped over ice. And the guy at the counter looked me in the eye with a straight face and said "I'm sorry, we can't serve iced espresso here. It's against our policy."

The whole world turned brown and chunky for a second. Flecks of corn floated past my pupils, and it took me a second to blink it all away.

"Okay," I said, "I'll have a triple espresso and a cup of ice, please."

He rolled his eyes and rang it up, took my money, gave me change. I stood there and waited. Then the barista called me over to the bar. I reached for it, and he leaned over and locked his eyes with mine, saying "Hey man. What you're about to do ... that's really, really Not Okay."

I could hear the capital letters in his voice, could see the gravity of the situation in his eyes.

He continued: "This is our store policy, to preserve the integrity of the coffee. It's about the quality of the drink, and diluting the espresso is really not cool with us. So I mean, you're going to do what you're going to do, and I can't stop you, but"

I interrupted. "You're goddamned right you can't stop me," I said. "I happen to have a personal policy that prohibits me from indulging stupid bullshit like this -- and another personal policy of doing what I want with the products I pay for." Then I looked him right in his big wide eyes and poured the espresso onto the ice.

Check out the comments on the blog post for a lot of comments, both by supporters and opponents of Simmermon, including a message from the owner of Murky Coffee threatening to "punch you [Simmermon] in your dick." Outstanding!


July 14, 2008

In the comments section, Olle (the proposal author) responds to my comments on IPETEE:
"Like IPsec, IPETEE lives at the IP layer" No, IPSec is an IP protocol, IPETEE is an application layer wrapper totally independent of IP-transport. It could just as well be used over any other network transport.

"one could easily adapt IPsec so that the KMP ran over the application channel" Yes, but it still wouldn't be equivalent to IPETEE because the transport is different (IP protocol 50, etc.). IPETEE doesn't mess with or even care about the underlying transport.

"one could easily deploy something like this with either SSL/TLS or IPsec" Not with IPSec, since it isn't transparent to the underlying network (see above). You could certainly do it with a modified TLS implementation, but why carry all that extra baggage when a slim implementation of the bare essentials will do?

This proposal is, as you point out, still a sketch. Actually it is just a brain-dump of a drinking session. It was found and "leaked" by some blog and revealed to the world before it was ready for prime-time (it hasn't even been proof-read). Fine. I'll deal with that. What it seems to be lacking most is the rationale behind the design choices, so I'll try to add that during this week.

/olle (the proposal author...)


One more thing:

"they pick an odd set of algorithms, in this case Salsa-20 and AES-CBC"

What's so odd about these? They were only chosen because they are the currently most widely recommended stream and block ciphers. If you have alternatives you prefer, please explain why (as I said the proposal has yet to see any technical review).

The "odd" AES mode with implicit IVs and ciphertext-stealing was chosen to avoid changing the size of datagrams when encrypting, btw.



I'm still not sure what layer IPETEE runs at. If you're running HTTP does IPETEE run above or below TCP? This does matter, since if it's the former you can simply use TLS/DTLS, whereas if it's the latter, you need to do something new, though it may be quite modest, like a framing layer for DTLS/TLS. With that in mind, it's not clear to me how one makes a system that does per-flow keying but lives below TCP/UDP, since the concept of flows (in IPv4) is primarily one that exists at the TCP/UDP level.

With respect to the point about "why carry all that extra baggage when a slim implementation of the bare essentials will do?", there are a number of kinds of overhead here. At minimum, there's the cost of design and implementation, code size, CPU, and data size on the wire. It's true that you can reduce to some extent the code size and on-the-wire data size by doing a special purpose design (though this is less than you'd think in terms of code size, since people tend to use OpenSSL as their crypto implementation, which means that unless you're pretty careful you end up eating the code size anyway), but (1) code size isn't that important in most settings and (2) this comes at a really high cost in terms of design and implementation. Designing and implementing a good cryptographic protocol is hard, even for experts, and so doing one that isn't flawed is requires some serious thinking. And of course you can do a far more efficient implementation of SSL/TLS than OpenSSL in terms of code size if that's what you're optimizing for. It's not clear that you can do that much better with a custom protocol.

As far as on-the-wire data size and CPU cost, TLS/DTLS isn't optimal, but there's not that much room for improvement. There are five contributors to TLS/DTLS overhead:

  • The header (5 bytes)
  • The sequence number (DTLS only) (8 bytes)
  • The IV (8-16 bytes)
  • The MAC (10-20 bytes)
  • Padding (CBC mode only, 1-16 bytes)

You can reduce this somewhat without compromising security. I'm not going to work through the details here, but there's a certain minimum amount of overhead you need: a length field, a MAC to provide integrity (encryption wihtout integrity is dangerous business), and an IV and probably a sequence number if you're using datagram transport. The IPETEE claims to use a fixed IV and doesn't mention anything about a MAC or sequence number. This probably isn't safe except under fairly restricted attack models. You need to worry about both integrity attacks and pattern attacks from the fixed IV. (Incidentally, if you're going to use a stream cipher like Salsa20 for datagram transport, you need some method for using different keystream sections for each datagram or there are really serious integrity problems). The CPU requirements are similarly fairly constant.

WRT to the question of ciphers: if you want zero data expansion (and, as noted above, you generally do need *some* overhead) the standard procedure with AES is to use counter mode, not AES-CBC with ciphertext stealing. It's not clear what the advantage of Salsa-20 is, but it's not an algorithm that's commonly used in any protocol I'm familiar with. That's not to say there's necessarily anything wrong with it, but it's also not clear to me what the advantage is; standard procedure would be so stick with AES-CTR.

So far, I haven't heard any really compelling arguments why something entirely new is needed.


July 13, 2008

The Pirate Bay guys are floating a proposal for "Transparent end-to-end encryption for the internets". The basic idea seems to be IP-level encryption with an opportunistic, unauthenticated, inband key exchange:
The goal is to implement IP-transport encryption in a way that is transparent both to the IP-layer (including nodes in the network path) and to the applications that benefit from the encryption.

The solution inserts a crypto layer between the IP-stack and application. This could be implemented as a filter hook for an operating systems BSD-socket layer or as a network stack filter (Windows TDI, etc.).

Before establishing a "flow", defined as a new stream for stream oriented communications (i.e. TCP) and a new IP/port tuple for datagram oriented communications (i.e. UDP), key negotiation takes place over the data channel to establish a session key. If the key negotiation fails we fall back to unencrypted mode and just pass the application data untouched, otherwise the established session key is used to encrypt traffic before passing it down the stack and decrypt traffic before sending it up to the application.

This description is extremely sketchy, but it's still possible to get some initial impressions. As usual with amateur designs, this has some odd aspect. First, it looks like they're reinventing everything: both key management and packet formats. Second, they pick an odd set of algorithms, in this case Salsa-20 and AES-CBC with a fixed IV and ciphertext stealing.

But ignoring the details, it's interesting to look at the architecture. IPETEE isn't really isomorphic to any existing design.

Like IPsec, IPETEE lives at the IP layer, but unlike IPsec, where the key management protocol is out-of-band on a specific UDP port the IPETEE key management is in-band, apparently mixed with the application layer protocol. This has the advantage that there's much less of a NAT/Firewall traversal problem, since you don't need to worry about punching a hole through the firewall for the key management protocol. However, if I'm understanding correctly, because the key management data is on the same channel as the data, this means that if you try to connect to a node which isn't IPETEE-aware, you'll most likely cause a protocol error, which doesn't happen with IPsec, where your KMP just times out if the other side doesn't recognize it. Note that one could easily adapt IPsec so that the KMP ran over the application channel instead of on a separate channel.

SSL/TLS, of course, does all its key management (and everything else) in the data channel. So, as with IPETEE, there's no NAT traversal problem. The advantage of IPETEE over SSL/TLS (whether in application or SSL-VPN style applications) is that it will support any protocol that runs over IP, regardless of the transport protocol. It's not clear how big an advantage that is, since pretty much all major applications run over TCP or UDP, and so you can use TLS or DTLS.

The other advantages of IPETEE aren't really architectural, but rather implementation issues. First, unlike SSL/TLS/DTLS style applications is IPETEE is clearly designed to be transparent and automatic, as opposed under application control. However, that's not a protocol issue, but just an implementation issue. It's quite possible to do a kernel/driver version of SSL/TLS—this sort of thing was contemplated when SSL/TLS was first designed, but it didn't take off—to a great extent since one of the major advantages of SSL/TLS was that it could be implementated at the application layer and didn't require any messing around in the kernel or driver layers. There's a tradeoff here between universality and ease of deployment here.

Second, because IPETEE doesn't bother to authenticate either side of the connection, there's not really any endpoint configuration required beyond installing the software. Again, though, this isn't really an architectural advantage. As many have noted, one could easily deploy something like this with either SSL/TLS or IPsec, and IETF even has a working group (BTNS) doing something very similar for IPsec. Moreover, any opportunistic system (one where you don't know whether the other side will do security) has downgrade attack issues, where the attacker forces you down to cleartext. This system is actually worse, since an attacker can also man-in-the-middle you undetectably if there's no credential checking. Also, as Hovav Shacham pointed out to me, if you try to renegotiate with each connection, there are more downgrade opportunities. This can be dealt with to some extent by caching the other side's capabilities, but this interacts unpleasantly with NATs. Again, these are all architectural issues, not implementation ones; you just need to decide what tradeoffs you want.

None of this is to say that this system won't take off, of course, but from a technical perspective it doesn't seem like IPETEE has any major technical advantages that couldn't be easily gained by adapting well-understood existing protocols.


July 12, 2008

On my way to gym today I caught a Living On Earth segment about Greenpeace's efforts to get large soy traders not to buy or distribute soy products produced on cleared rainforest land:
GELLERMAN: But a soybean is a soybean is a soybean. I mean once they're in the bag you don't know where they've come from. How do you enforce that?

ALLEN: Well we don't look at the soybean we look at the farm. So the mapping and the monitoring and the land registration all go together and what we do is we want to have maps to a scale where we can, as soon as we see deforestation happen, we note it. We know who owns that land, we know what has been planted there, we know if it's related to cattle grazer, rice production, soy production, and we then essentially blacklist those farmers so that the traders know that they cannot buy from these farmers even if part of the farmer's land has been used legally to grow soy if they've deforested a new area they will be blacklisted and the traders have agreed to this as part of the moratorium.

This procedure probably will work, but I wonder if there's a technological fix here. The basic idea would be to tag areas of the rain forest with chemical signatures, which would transfer to the soy beans grown in those areas, thus allowing you to determine where a given bean was grown.

The most attractive technique in terms of biocompatibility is to use isotope ratios. For instance, sulfur isotope ratios (S-34/S-32) can be precisely measured and are used as natural environmental tracers. [*]. What we want is an element which is taken up from the soil and is relatively geographically immobile (so that you don't get contamination of neighboring regions). It's possible that there is a set of isotopes that is already characteristic of each region, but more likely we'll need to tag each region so we also want an element with a rare, reasonably long-lived isotope, so that we don't need to spray/dump too much onto the soil in order to bias the isotope ratios in some measurable way. Even better if we have several elements since we can independently vary the ratios to give unique combinations—we may also be able to combine tagging with natural variation if we take measurements. I did a little looking about what elements fit this bill. Some possibilities:

  • Sulfur (S-36 has a .02% fraction)
  • Chlorine (Cl-36 is synthetic and has a 30000 year half-life)
  • Selenium (Se-80 is synthetic with a 30000 year half-life)
  • Potassium (K-40 has a .012% fraction)

Another alternative is to use tagging chemicals. Lots of compounds get taken up by organisms (think DDT, PCBs, etc.), and it shouldn't be that difficult to use compound ratios (perhaps combined with isotope ratios) to produce distinct signatures. The problem here is to find some set of chemistry that everyone agrees is safe to spray over a somewhat inhabited area (again, think DDT, PCBs). We don't really have that problem with isotope tagging as long as we stick to non-radioactive isotopes or isotopes with very long half-lives (remember that there's some baseline radioactivity in the world anyway).


July 11, 2008

Watched some of The Eiger Sanction today. What with the constant boozing, casual sexism, homosexual stereotypes (check out Jack Cassidy as the amazingly flaming Miles Mellough with a dog named "Faggot"), and prehistoric mountain climbing technology (in the early rock climbing scenes they're not even wearing swami harnesses, just a loop of rope tied around their bodies; later they're using around the waist belays) the movie feels incredibly dated. But you know what's really dated? Clint's a professional assassin who's standard fee is $10,000

July 10, 2008

I just got back from a backpacking trip to Tahoe National Forest, mostly on the Pacific Crest Trail

Trip Summary: (map) Lola Montez trail to Hole In The Ground Trail to PCT. PCT North to Paradise Lake (almost). PCT South to Tinker Knob. PCT back North to Highway 40. Highway 40 to Donner Lake. Two days of hiking, about 45 miles, 12 kft. Note: the profile and map aren't quite right because Topo USA doesn't have the Mt. Judah Loop trail which I took on the way back1. Add about 1.4 miles and maybe 500-1000 feet of climbing.

Some notes:

  • Try not to camp where there are a ton of mosquitos. It's really hard to set up your tent and get inside without getting a big swarm of mosquitos in with you. Then you have to kill them all by hand. This happens every time you open the tent.
  • There were a lot of other backpackers out—more than on most trips I take. A lot of them tell you that they're hiking to "Canada". They're usually the ones covered in dirt.
  • Doing 20 mile days on the PCT (the standard recommendation for through-hikers) is a lot of work. My longest day was about 18, and while I could have done 20, it definitely would be not easy to do day after day.
  • This was the first time I didn't bring a stove. Eating cold meals isn't actually that bad (I'm not that hungry at 7000+ feet anyway) and it does save some weight and space in the pack.
  • Even a trail as well-travelled as the PCT can be hard to follow in places. As the map shows, there's a section that people often bypass on the road where I could see a single trail marker but no trail and finally had to take the road myself. It's especially hard when you're headed the opposite direction from the South-North direction people typically use on the PCT.
  • You can't trust anything people tell you about where there is water. Both I and others I met were told that there wouldn't be any water in various locations. It kind of sucks to carry 4 liters of water up a large hill and then come across a perfectly good stream. On the other hand, it would really be bad to have not had any water, what with high temperatures of 85-90.
  • You can really notice the effect of the fires. Even from high points, everything is kind of hazy and on Wednesday morning I could see this huge column of smoke way off in the distance. Some other hikers I met tell me that Tuesday night they could see this mushroom cloud backlit by the sunset. As I recall, Ben, the classic hippie-looking Jesus dude who told me about this described it as "It's crazy, man." When we drove through Sacramento on the way back the smoke was really intense—even objects only a kilometer or so away were significantly obscured.
  • Important tip: when something is described as "a .2 mile scramble" do not attempt to do it with your pack on. I made this mistake at Donner Peak.
  • Hiking on road sucks. I thought it would be OK to hike into town on Old Highway 40. Bad idea; tiring and tiresome.

1The GPS I'm using (Garmin Vista C) seems not to be able to lock on when it's vertical in your pocket. The result is that the downloaded track is all over the place. I had to recreate it using the maps in TopoUSA and tracing over the roads to make them routable. This is not the most convenient hardware/software combination I've ever seen.


July 4, 2008

As you've no doubt heard, the District Court has ruled that YouTube has to hand over their entire database of who has watched each video. A little web searching didn't turn up the original motion, but the ruling is here:
Plaintiffs seek all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website. Pls.' Mot. 19.

They need the data to compare the attractiveness of allegedly infringing videos with that of non-infringing videos. A markedly higher proportion of infringing-video watching may bear on plaintiffs' vicarious liability claim,3 and defendants' substantial non-infringing use defense.4

As others have noted, the claim that the IP address or login name aren't personally identifying isn't very credible. [Though, did you notice what the judge cites as evidence that they're not personally identifying? A post by Alma Whitten on Google's policy blog.]

Ignoring that, though, Viacom certainly doesn't need access to the entire database to answer this question, a small statistical sample would be plenty. Moreover, with the question as phrased above, you don't need the identities of the people downloading the videos at all: you just need to know the number of times each video was downloaded in any given time period. If you're truly worried about this being distorted by multiple downloads by the same viewer (which seems unlikely), you can assign identifiers in a unique sequence for each video. E.g., the first person who downloads video A gets identifier 1, the second identifier 2; the first person to download B gets 1, etc. [you can use random identifiers too, for better privacy]. If all you want to do is compare popularity, there's no need to link viewers between videos.

On the other hand, what this database would be useful for is identifying and pursuing the users who uploaded and downloaded videos Viacom claims infringe. For that, you would want both the identities of users (so you know who to go after) the whole database (so you can identity everyone).


July 3, 2008

Even now that public opinion has started to shift towards more concern about climate change, political inertia—and especially the inherent conservatism of the American system—make a dramatic change like a nationwide carbon tax or cap-and-trade system incredibly difficult to implement. It's just too easy for a small group of vocal opponents to block legislation, or more likely dilute it to the point where it doesn't do anything. [When I say "too easy" I'm not taking a normative position; I just mean that the system isn't designed to make change easy.]

But consider our current situation: in the past year the price of gasoline has gone from about $3.00 to about $4.00/gallon. In effect, from the perspective of 2007, we've imposed a $1.00/gallon carbon tax. For comparison, even proponents of carbon taxes are looking at more like $.10/gallon. This isn't ideal for a number of reasons:

  • The price of gasoline depends to a great extent on the price of oil and the price could go down at some point. On the other hand, there's no reason to believe it will go down and people's behavior has already started to change.
  • Oil burning isn't the only carbon emitter and coal and natural gas prices aren't going up as smoothly, though a little searching suggests they may be going up too, which is what you'd expect.

The good news, though, is that the inertia of the political system works to keep prices high. Even if there was something effective the government could do to bring prices down—which seems unlikely in any case—all that carbon control proponents need to do is block that legislation, which is a lot easier than getting their own legislation passed.


July 1, 2008

After the California Top-to-Bottom Review, Alex Halderman, Hovav Shacham, David Wagner, and I got together and asked ourselves whether there was some way to make good use of the existing voting systems. The result was:
You Go to Elections with the Voting System You Have: Stop-Gap Mitigations for Deployed Voting Systems

J. Alex Halderman, Eric Rescorla, Hovav Shacham, David Wagner

In light of the systemic vulnerabilities uncovered by recent reviews of deployed e-voting systems, the surest way to secure the voting process would be to scrap the existing systems and design new ones. Unfortunately, engineering new systems will take years, and many jurisdictions are unlikely to be able to afford new equipment in the near future. In this paper we ask how jurisdictions can make the best use of the equipment they already own until they can replace it. Starting from current practice, we propose defenses that involve new but realistic procedures, modest changes to existing software, and no changes to existing hardware. Our techniques achieve greatly improved protection against outsider attacks: they provide containment of viral spread, improve the integrity of vote tabulation, and offer some detection of individual compromised devices. They do not provide security against insiders with access to election management systems, which appears to require significantly greater changes to the existing systems.

The paper will appear at EVT '08. (PDF.)