July 2006 Archives

 

July 31, 2006

Today's Louisville Kentucky Courier Journal has an article on how the RIAA is suing a bunch of local old people for downloading music. As usual, the reporting isn't very good:
It's not OK with recording companies, who lose money with each illegal download and have filed more than 18,200 civil suits since September 2003.

Look, I know this is the recording industry party line, but it's simply not true. If I download a song that I never had any intention of buying, it doesn't cost the recording company anything. A better reporter might have pointed this out.

There's also an oh-so-helpful sidebar on what's legal and what's not:

What are examples of violating copyright law?
  • Sharing music files on the Internet, via peer-to-peer networks.
  • Burning copies of a CD for friends.
  • Transferring a music file through an instant messaging service.
  • Lending a friend a CD so he or she can copy it.

This too, is from the RIAA playbook, but it's misleading at best. First, it's clearly not illegal for you to share music you yourself made or that is freely available using any of these methods. There's lots of such music about. Second, it's not at all clear that burning a copy of a CD for a friend—much less letting them borrow your CD to copy—is a copyright infringement. The Audio Home Recording Act specifically exempts non-commercial copying. These protections have been subsatantially modified by the NET Act, but as I understand the situation, it's not at all a slam dunk that private copying in small quantities is a violation.

 
Bruce Schneier observes that iPod theft is a natural consequence of iPods:
"Rise in crime blamed on iPods", yells the front page of London's Metro. "Muggers targeting iPod users", says ITV. This is the reaction to the government's revelation that robberies across the UK have risen by 8 per cent in the last year, from 90,747 to 98,204. The Home Secretary, John Reid, attributes this to the irresistible lure of "young people carrying expensive goods, such as mobile phones and MP3 players". A separate British Crime Survey, however, suggests robbery has risen by 22 per cent, to 311,000.
This shouldn't come as a surprise, just as it wasn't a surprise in the 1990s when there was a wave of high-priced sneaker thefts. Or that there is also a wave of laptop thefts.

What to do about it? Basically, there's not much you can do except be careful. Muggings have long been a low-risk crime, so it makes sense that we're seeing an increase in them as the value of what people are carrying on their person goes up. And people carrying portable music players have an unmistakable indicator: those ubiquitous ear buds.

I'm not sure I necessarily believe that iPods are actually a causal factor in increases in robbery. If you look at the 2004/2005 police reporting data (in Crime in England and Wales 2004/2005) you'll see that there was a 12% drop in robbery between 2003/2004 and 2004/2005. 1 Since 2004/2005 was a fine year for iPod sales, I think it's making quite a speculative leap to say that iPods are responsible for an increase in robbery. That isn't to say that there aren't more iPod robberies, of course. As iPods get more popular, you'd expect that more robberies would involve them--but that's not the same thing at all.

I also don't agree that there's not much you can do. Indeed, there's a pretty obvious technical fix—the same one that has been applied to car radio theft—make a stolen iPod useless, or at least less useful. To a first order, iPods need to be used with iTunes, which means they're tied (and mostly mated) to a particular computer. Apple could easily arrange that new iPods had to be synched up every so often (2 weeks, a month, whatever) or it went catatonic. You'd need to disable refreshing the firmware (remember, the thieves want your iPod, not your copy of Tubular Bells) but that's no problem when you control the hardware platform. You'd also need some kind of "my computer exploded" reset procedure, but again that's fairly straightforward to do in a number of ways (keep that sticker that came on your iPod people).

And note that if you really hate this feature it needn't affect you: Apple could just as easily have a button in iTunes that unlocked the iPod permanently. As long as most people don't press that button, the expected value of a stolen iPod would start to decline fairly rapidly and it's at least possible that people would lose interest in stealing them. Come to think of it, though, I don't actually know if this kind of technical fix had any impact on car radio thefts. Anecdotally, it seems like I know people who have had their cars broken into and their radios not stolen—on the other hand, their CDs were stolen, so maybe I'm wrong about that whole Tubular Bells thing—but I don't have any actual hard data.

1. Note that the numbers in the report I'm citing don't quite match up with the numbers cited above, but they're within 5% so I imagine we're looking at different measurement methodologies

 

July 30, 2006

As you've no doubt heard by now, Floyd Landis's post stage-17 drug test has shown an unacceptable ratio of testosterone to epitestosterone (T/E) . A few notes:

Why use the T/E ratio
Testosterone levels vary substantially, so just testing for testosterone levels isn't diagnostic for doping. The idea is that the T/E ratio is stable.

What now?
When the athlete provides a urine sample, they actually provide two samples, an "A" sample and a "B" sample. The labs test the "A" sample and keep the "B" sample for backup in case of testing errors. Landis has already requested that the "B" sample be tested. As I understand it, if that comes up negative then it's assumed that there was testing error initially.

What if the B sample comes up positive?
If the B sample comes up positive, then there's confirmation that the T/E ratio is high. Unfortunately, there's a lot of variation between people in T/E ratio as well. C-12/C-13 isotope ratio is used to confirm whether the testosterone is exogenous or endogenous (see here for previous discussion of this technique).

It's not entirely clear to me at what point in this process the Tour authorities decide Landis is guilty and strip him of his win.

For readers: how fast does testosterone act to improve performance? Would it really have paid off to take it the day before an important stage?

 

July 28, 2006

In the short story Franchise, Multivac's ability to predict human behavior is so good that it can predict the outcome of an election from interviewing a single voter. Despite that, it communicates with punch cards:
Somehow he had thought Multivac would speak in a sepulchral, superhuman voice, reson and echoing, but that, after all, was just an idea he had from seeing too many television shows, he now decided. The truth was distressingly undramatic. The questions were slips of a kind of metallic foil patterned with numerous punctures. A second machine converted the pattern into words and Paulson read the words to Norman, then gave him the question and let him read it for himself.

Franchise was published in 1955, back in the era where there were separate punch card readers and writers.

 

July 27, 2006

Was in the library the other day and in a fit of nostalgia I picked up a bunch of old Isaac Asimov "robots" books. Like nearly all science fiction authors of that era, Asimov got computers pretty much all wrong, in at least three major ways.

Some vague spoilers below, though really these books are so old that you don't have much to complain about.

Form Factor
Probably the most excusable miscalculation is size. In Asimov-land, computers come in two varieties: gigantic tube or relay-based monstrosities like multivac and positronic brained robots (see the relevant Wikipedia entry). A positronic brain is about the size of a football but seems to be basically a solid mass of compute power. By contrast, all but the largest modern microprocessors actually have a tiny volume. This isn't that surprising, of course, since Asimov wrote most of these stories before the invention and popularization of the IC, so he was left to either extrapolate from existing computer technology (resulting in gigantic machines) or invent something fundamentally new and unspecified (resulting in positronic brains).

UI versus AI
As the Wikipedia article works out, the capabilities of Asimov's imagined computers don't very well match how computers actually work:

Although these stories are well-known, it is hardly ever recognized that Asimov's robots are nothing at all like computers, as the main series of them predated any of the major computer projects. The main stumbling block is that writing a program that would be able to determine whether any of the three laws would be violated is far more difficult than writing one for machine vision, or speech recognition, or even comprehending the activities and motivations in the human world, which is only attempted by determining a vast list of rules to check. Also, the stories' robots never get programming viruses, or require updates. They may, however, have new features installed (like R. Giskard, as we are told in Robots and Empire). Most importantly, they only stop functioning due to a clash between the (hypothetical) subroutines which determine whether one of the laws has been violated, never a crash of a subroutine itself: they are never at a loss as to what is going on, only what to do about it.

Actually, the problem is worse than that. Like many SF authors, Asimov seems to have assumed that AI was mostly an emergent property of assembling enough computational power in one place, but decent UI seems to be much harder. See, for instance, All The Troubles Of the World, in which Multivac is clearly intelligent but you communicate with it more or less via teletype--not even by something as primitive as a text terminal:

Othman used the instrument on Gulliman's desk. His fingers punched out the question with deft strokes: "Multivac, what do you yourself want more than anything else?"

The moment between question and answer lengthened unberably, but neither Othman nor Gulliman breathed.

And there was a clicking and a card popped out. It was a small card. On it, in precise letters, was the answer:

"I want to die."

Even in the robot novels, it's assumed that AI-type functionality is easier than better UI. See, for instance, Robbie, which features a clearly intelligent robot that understands speech but can't speak.

Hardware and Software
Probably the most interesting difference between Asimov computers/robots and real computers is how much is done in hardware. Modern computers are general purpose computing engines with most of the behavior controlled via software. Based on the fairly sketchy information in the robot stories, it appears that robots operate on a rather different principle.

Here's Asimov's description of a robot which observed the death of a human (from The Naked Sun:

A robot entered. It limped, one leg dragging. Baley wondered why and then shrugged. Even among the primitive robots on Earth, reactions to injury of the positronic paths were never obvious to the layman. A disrupted circuit might strike a leg's functioning, as here, and the fact would be most significant to a roboticist and completely meaningless to anyone else.

And an example of a robot given contradictory orders from The Robots of Dawn:

What was troubling the robot was what the robotocists called an equipotential of contradiction on the second level. Obedience was the Second Law and R. Geronimo was now suffering from two roughly equal and contradictory orders. Robot-block was what the general population called it or, more frequently, roblock for short.

Slowly the robot turned. Its original order was the stronger, but not by much, so that its voice was slurred. "Master, I was told you might say that. If so I was to say--" It paused, then added hoarsely, "I was to say--if you were alone."

...

For a moment, Baley played irritably with the notion of strengthening his own order and making the roblock more nearly complete, but that would surely cause the kind of damage that would require positronic analysis and reprogramming. The expense of that would be taken out of his salary and it might easily amount to a year's pay.

Asimov is of course never clear how robots really work, but the above passages suggest two things. First, all this talk of robots actions being chosen by potentials is supposed to make you think of voltages and robots as being sort of an analog computer as opposed to a modern digitally programmed one. Second, if the robot gets sufficiently confused its brain can suffer physical damage. At another point in The Naked Sun we see a robot which has witnessed the death of a human rendered totally inoperable beyond any repair. Runaround is another example of an early story with a similar theme of apparent global malfunction caused by a specific conflict.

By contrast, while modern computers crash all the time, this virtually never causes any hardware damage and nobody would expect that just because you gave the computer some input it wasn't prepared to handle it would somehow physically break it. You can reboot the system and it comes online just fine. Even in the worst case, all you have to do is reinstall the software and everything is normal. If the computer crashes and the hardware is broken, the arrow of causation almost certainly goes the other way. The reason, of course, is that most of the interesting stuff in a modern computer happens at the software level, so that's where crashes happen too. From the perspective of the hardware, a malfunctioning computer program looks much like a properly functioning one.

That said, this inaccuracy isn't surprising. First, Asimov wrote these stories long before the age of modern software, so he wouldn't have had the appropriate sense of how computers work. In particular, the earliest robot stories were written in the 40s and while Turing completeness was known it certainly hadn't entered the general consciousness--indeed ENIAC had yet to be built--so it wouldn't have been obvious that the most flexible approach was to build a general purpose computing platform and then run software on top of it. Second, the electrical circuit metaphor is a lot more evocative and of course having a crashed robot be physically destroyed is more interesting from a narrative perspective. That probably explains why Asimov continued to use the same basic model long after the pre-eminence of software in real computers had become apparent.

 

July 26, 2006

I've taken a look at the various dnsext key rollover
documents and wanted to provide some comments. This
is the first of two messages, the first up-levelling
on the general rollover problem, and the second providing
specific comments on draft-ietf-dnsext-trustupdate-timers.

Rambling follows....

BACKGROUND
----------
DNSSEC, like nearly all public key authentication systems, assumes
that you have some set of trust anchors configured into your system
and that you use them to transitively verify other keys in the system
[0]. Pretty much by definition, the initial set of trust anchors are
either shipped with the verifying software or are manually configured.
There's then a machine verifiable chain of signatures from some trust
anchor to any given verifiable data item (e.g., certificate).

The question then arises about how to replace the keys for
the trust anchors. Roughly speaking, there are two reasons 
to roll over your signing keys:

1. You're trying to stay ahead of the attacker's cost curve
   (Moore's Law, progress in factoring, etc.)
2. Your key(s) have been compromised.

Reason 1 is fairly straightforward. At time 0, You make a key (K0)
with an expected lifespan of X years. At a reasonable time before X
(say X/2) you generate a new key (K1) with an expected lifespan of Y
years and distribute it somehow. The way this is done in browsers is
that you just get the new key placed in the current revision of the
browser and assume that the new revision will percolate out before key
K0 expires.

Reason 2 is more problematic and the browser world has mostly chosen
to punt on it. The theory here is that if a root key is ever known to
be compromised, the browswer manufacturers will just release a
security patch without that key--and maybe with the replacement--and
propagate it via the usual mechanisms. Given that there are lots of
security vulnerabilities worse than a compromised key and we let the
ordinary patch process handle them, this doesn't seem crazy.

In other words, the browser world doesn't have any mechanism
for root rollover separate from the normal software update
process.


ISLANDS
-------
In an ideal DNSSEC world, there would only be one trust anchor: the
root signing key, or a small number of mostly static anchors, like the
100 or so that are in your average browser. As a practical matter,
people seem to be assuming that there will be a bunch of islands--signed
subtrees within a generally unsigned DNS tree. The problem that people
appear to be grappling with is:

     (a) How to provision those island keys.
     (b) How to roll them over if they're compromised.

To that end, we have (at least): 

     draft-ietf-dnsext-rollover-requirements-02
     draft-ietf-dnsext-trustupdate-timers-02
     draft-ietf-dnsext-trustupdate-threshold-01
     draft-laurie-dnssec-key-distribution-02

The matter is somewhat complicated by the fact of multiple certificate
formats. So, for instance, one could imagine that you were operating
the island "example.com". You could get an ordinary SSL/TLS
certificate from VeriSign for "example.com" and distribute your DNSSEC
trust anchor definition over SSL/TLS. From a DNSSEC perspective, this
is automatic provisioning of the island key, but from a security
perspective it's a single root of trust rooted in VeriSign and is
isomorphic to ordinary CA operation. A similar situation would obtain
if, e.g., ISC periodically published a new island key list for BIND
users to download. 


PROBLEMS TO SOLVE
-----------------
It seems to me that there are three problems that need to be solved
here: 
      1. How to initially provision island keys.
      2. How to transition between two keys (assuming you have some
	 way to validate the replacement key).
      3. How to validate the replacement key.

Problem 2 is a straightforward protocol engineering problem. I
get the impression that it's not entirely solved based on
S 3.8 of draft-ietf-dnsext-trustupdate-threshold, but it's
not tricky in any abstract sense. Problems 1 and 3 are tricky,
however.


INITIAL PROVISIONING
--------------------
Roughly speaking, there are four classes of initial provisioning
solutions:

- Manual provisioning  -- you manually enter the keys
- Leap of faith	       -- you trust the first key you get 
			  for a given zone
- Central trust anchor -- you trust signatures from some 
			  central trust anchor. This could
			  either be a conventional CA like I
			  indicated in my example above or
			  just your software vendor who ships
			  you the current trust anchor list
- Web of trust	       -- you trust some set of people and they
			  sign keys for other people and you
			  somehow compute a trust metric over
			  the signatures.

Note that a central trust anchor is basically just a single
root and it makes sense to treat it that way, even if the
signatures are done with some non-DNSSEC mechanism. So,
for the purposes of rollover we'll assume you're provisioning
some other way.


ROLLOVER
--------
        
 

July 25, 2006

One of the big problems in endurance racing nutrition is hyponatremia [*] [*]. In hot conditions you can lose over 3g of sodium per hour. Old-style sports drinks like Gatorade or Cytomax contain something like 50 mg of sodium/8 oz, so you'd need to eat drink like 15 quarts of fluid an hour to keep up--clearly not a practical solution. The standard solution is to eat salty foods or take salt tablets, but this makes life a lot more complicated than you'd like when you're trying to output maximum effort. Ideally, you'd like whatever energy drink or gel you're taking to have enough sodium in it.

The sports nutrition guys have been paying attention and started jacking up the sodium levels of their products. For instance, Gatorade's new Performance series contains 200 mg of sodium/8 oz. Similarly, PowerBar's Endurance energy drink contains 680 mg sodium/liter. I haven't tried the PowerBar product (though I liked their old one) but the Gatorade drink is pretty good.

Not all of these attempts have been successful, though: PowerBar has updated their energy gel to include 4x as much sodium as before: 200 mg/gel. I really liked the old gel, but the new one is so salty it's extremely unpalatable. I went for a long run in the heat on Sunday with only the new chocolate gel and was really glad I had water when I tried it, because otherwise I don't think I could have gotten it down at all.

Balancing taste and the requisite sodium content is a perennial problem for nutrition designers. The standard approach seems to be to add sweeteners in an attempt to balance the salt, but this makes getting the simple sugar/complex carbohydrate balance tricky. This raises the interesting question: is there a way to add sodium without making things taste saltier? Obviously, sodium chloride won't do, and the problem is the sodium, so any other sodium salt that ionizes in water won't help either (remember, you have to keep the sodium in a water solution). What you need is a sodium compound that ionizes in the gut but not in the drink or gel. If you had one, you could use it to salt-boost your drink or gel without affecting the taste. So, is there some edible (and tasteless) sodium salt that ionizes at low pH levels (the stomach is around pH 2-4) or under the action of some stomach enzyme?

My inorganic (and organic) chemistry is pretty rusty for that matter. Any chemists out there want to suggest a compound?

 

July 24, 2006

In the comments section, Roland Dobbins writes:
With regards to the CPU impact of IPSEC, it's important to realize that even larger routers in many cases don't process IPSEC (and the requisite GRE tunnels inside the IPSEC) in hardware, but ratehr in software on the main system route processor - and even on platforms which offer hardware IPSEC and GRE processing, this functionality requires additional modules or blades which a) increase platform cost and b) often occupy a slot which could be used to terminate customer circuits or for other, more directly revenue-generating functions. So, the operational impact of doing this is also prohibitive in many cases.

So, this is an issue about symmetric key processing rather than asymmetric. My understanding is that TCP-MD5 is also done in the route processor, so replacing TCP-MD5 with IPsec should be more or less a wash, modulo the key management, which, as I argue, is a small cost.

One other comment - advocates of encryption for BGP (or any of the various IGPs, for that matter) haven't really made a clear case as to what the actual benefits of doing so really are. After all, if a miscreant is in a posiiton to listen to (and perhaps alter or inject) routing-protocol announcements in the first place, the network operator has bigger problems which simply encrypting the routing-protocol sessions won't address.

I mostly agree with this: BGP information isn't really that sensitive in a confidentiality sense. I was assuming IPsec with an integrity-only mode such as ESP-NULL or AH. That's why I expect the cost to be commensurate with TCP-MD5.

 

July 23, 2006

One of the hot topics in IETF Security right now is how to fix/replace the aging TCP-MD5 (RFC 2385 mechanism for protecting BGP sessions. TCP-MD5 uses manually configured shared symmetric keys on both ends. One obvious approach is to use IPsec, but there's been a lot of resistance to that from the operators and the router vendors. At IETF 66, Brian Weis gave a talk trying to explain some of the concerns. Some of it makes sense, but in other places it represents a number of misunderstandings about cryptography and communications security.

The need for rekeying
Slide #8 argues that rekeying traffic keys is important. There are actually two issues here:

  1. Key exhaustion.
  2. Compromise recovery.

The key exhaustion issue is supposedly that you can't use the same cryptographic key to protect too much data. (Weis calles this "overuse"). This was true with old ciphers--lots of known plaintext gave cryptanalysts leverage. It's very slightly true for small-block block ciphers like DES, where you start to get small amounts of leakage at around 2^32 blocks (2^35 bytes) in CBC mode. Resistance to this kind of attack goes up with the block size so as a practical matter AES is immune (the limit is 2^64 blocks/2^68 bytes). As far as I know, this isn't a practical issue for MAC algorithms like HMAC-MD5.

The compromise recovery issue is real; whenever people who have access to the authentication keys quit you obviously have to worry about them posing as you. It wasn't entirely clear to me what Weis meant, but this problem can't be solved by rekeying the traffic keys because the attacker can impersonate you when forming a new connection with new traffic keys. You need to replace the authentication keys.

Pre-Shared Public Keys
Slide 9: If you want to use public key cryptography for session key establishment, there are two basic choices:

  1. Put the keys into some kind of PKI.
  2. Pre-shared public keys.

This talk argues that it's not practical to have all the keys for your BGP peers in a single PKI. I'm not convinced this is true, but it's mostly a religious issue and I mostly try to avoid arguing with people who have PKI allergy. I do want to address the pre-shared public key issue.

Weis's argument for why you can't have pre-shared public keys is:

  • Results in twice as many keys being exchanged in the secret key case
    • Routers may have a limited amount of NVRAM or flash available, and public keys are relatively large
The first statement is mostly correct. In order to establish mutual authentication between A and B using pre-shared public keys, you need an out-of-band exchange where A sends B his public key (or fingerprint) and vice versa. It's not clear why this is really a big deal. After all, people do this routinely with SSH public key authentication.

The second statement (about NVRAM) is basically wrong. It's of course true that public keys are generally larger than symmetric keys (actually, with elliptic curve cryptography you can have them only about 2-3 times as large as symmetric keys or even down to 25% larger if you're willing to have a shared group), but this fact is mostly irrelevant because there's no need to store the public keys in NVRAM. You can just store a digest of the public key, which need be no larger then the symmetric key you would otherwise have stored (as small as 80 bits is secure here under most reasonable assumptions). During the initial crypto handshake the peer provides its public key and you can compute the hash and compare it to the stored value. The only additional storage is your key pair which is order 200-500 bytes.

One real advantage of PK-based authentication that doesn't get talked about a lot is routine exposure of the keying material. If you use manual key mangement to share a symmetric key with a peer, the operators (i.e., the guy with access to the keyboard) pretty much necessarily has to know the symmetric key because he has to type it in. By contrast, if you use PK-based authentication then you never need to see the private key, which lowers the chance that it will get written on a post-it or sent around on a fax, thus lowering the exposure risk when employees leave. Sure, a malicious employee with the root password can usually get the key out of the router, but the router software can be designed to make this difficult and if necessary you can store the key in a hardware security module. This option isn't available with symmetric-key systems.

Performance of Public Key Cryptography
The other argument this presentation makes against public key is performance:

  • Public key operations are computationally expensive
    • Supporting public key operations from many peers requires customers to buy blades/boxes that they don't otherwise need.

I don't buy this argument. Remember that the underlying assumption here is that you're going to pre-configure your associations with all your peers (that's what you get when you don't use PKI or other central authentication systems). So, as a practical matter this limits you to no more than a thousand or so peers (the benchmark number on this slide is 1000). Any reasonable processor can do order 100 RSA-based key exchanges per second (this not very fast laptop will do 160). So, we're talking under 20 seconds of CPU time even under fairly pessimistic assumptions. And since this cost only needs to be incurred very occasionally (either when an authentication key changes or after a router reboot--and these are inherently very expensive even without the public key operations).

So what?
So, what point am I making here? Certainly it's quite possible that IPsec is inappropriate for an application like BGP. In particular, it's known that configuring IPsec can be fairly difficult and the UIs that vendors give you aren't anything to write home about. But I don't think the particular technical objections being raised here are very convincing.

 

July 22, 2006

 

July 20, 2006

In the past eight days, Hezbollah has fired nearly a thousand rockets into Israel, causing 15 civilian casualties--less than 2 people per hundred missiles--which doesn't seem like a very efficient way to kill people. The basic problem (assuming you're Hezbollah) seems to be that the accuracy of the missiles they're using is pretty bad, order of a few kilometers. Still, if you look at aerial photos of Israel, it's pretty densely populated. You'd think you could hit and destroy inhabited houses more than 2% of the time. Anyone have any more information on this?
 

July 19, 2006

Richard Tur, who filmed the Reginald Denny beating is suing YouTube for distributing his footage:
Robert Tur says video he shot of the beating of trucker Reginald Denny during the 1992 Los Angeles riots was posted at YouTube without his permission and viewed more than 1,000 times. Tur says in his lawsuit, filed Friday in U.S. District Court, that YouTube is profiting from his work while hurting his ability to license his video.

"Mr. Tur's lawsuit is without merit," YouTube said in a statement. "YouTube is a service provider that complies with all the provisions of the Digital Millennium Copyright Act (DMCA), and therefore is entitled to the full protections of the safe harbor provisions of the Act."

Passed in 1998 to protect copyright holders from technology that facilitated piracy, the DMCA also offered protection to Web service providers by limiting their liability in cases where their customers were found guilty of copyright violation.

Those in the video-sharing sector have for months expected someone to challenge YouTube in court. The San Mateo, Calif.-based company lets users post videos to its site without prescreening them, and a staggering amount of copyright video exists on the site. YouTube prohibits the uploading of such material but has also benefited in the past when someone has posted a professionally made clip that catches fire with the public.

Copyright infringement on public content sites like YouTube and Google Video--like pedophiles on MySpace is--hard to tackle. The problem in this case is that it's really hard for the site to identify copyrighted material. It's true that YouTube scrubs their site for pornography, but only a small fraction of unauthorized material is as easy to identify as porn. Sure, you know that when you see the Daily Show on YouTube it's probably not owned by the person who uploaded it--though did Comedy Central authorize it or not?--but what about some random music video? It's not like YouTube has a copy of everything that's ever been copyrighted.

The regime imposed by the DMCA implies that it's the responsibility of the copyright holder to somehow monitor these sites. This is more practical in some sense since they at least know what they have, but that's not really practical either. It might be possible to write a tool that did this kind of search for each copyright holder, but you're talking about a pretty significant bandwidth and computational cost. It's certainly well outside the capabilities of your average home videographer. That said, I don't know that much about video search. Do any readers have a sense of what it would look like to do a fuzzy search of millions of hours of video against every video on YouTube?

 

July 18, 2006

Koblitz and Menezes are at it again. Back in 2004, they published Another Look at "Provable Security" arguing that the reduction proofs that are de rigeur for new cryptosystems don't add much security value. (See here for a summary.) Last week, K&M returned to the topic with Another Look at "Provable Security" II which is about the difficulty of interpreting the reduction results. They take on the proofs for a number of well-known systems and argue that the don't show what you would like.

One of the themes that shows up repeatedly is what's called "non-tightness of reduction". Recall that the idea behind a reduction proof is to show that if you could break the cipher (recover the key, plaintext, whatever you've modelled it as) then you could use that ability to get leverage on some believed-hard problem. So, for instance you might want to prove that breaking your algorithm was "equivalent" to letting you factor large products of prime. There's generally some additional effort required to solve the hard problem, so you get a result where being able to break the cryptosystem with effort E implies being able to solve the hard problem with effort cE. If c is small, then the reduction is called "tight". If c is large (and we're talking real large, like big powers of two) then it's called "non-tight". K&N write:

Suppose that researchers have been able to obtain a highly non-tight reduction from a hard mathematical problem to breaking a protocol2. There are various common ways to respond to this situation:

  1. Even a non-tight reduction is better than nothing at all. One should regard the cup as half-full rather than half-empty, derive some reassurance from what one has, and try not to think too much about what one wishes one had.
  2. Even though the reduction is not tight, it is reasonable to expect that in the future a tighter reduction will be found.
  3. Perhaps a tight reduction cannot be found for the protocol in question, but a small modification of the protocol can be made in such a way as to permit the construction of a tight reduction -- and we should regard this reduction as a type of assurance about the original protocol.
  4. A tight reduction perhaps can be obtained by relaxing the underlying hard problem (for example, replacing the computational Diffie­ Hellman problem by the decision Diffie­Hellman problem). [This is a very popular move -- EKR]
  5. Maybe the notion of security is too strict, and one should relax it a little so as to make possible a tight reduction.
  6. Perhaps the protocol is secure in practice, even though a tight reduction may simply not exist.
  7. Perhaps the protocol is in fact insecure, but an attack has not yet been discovered.

These seven points of view are not mutually exclusive. In fact, protocol developers usually adopt some combination of the first six interpretations -- but generally not the seventh.

K&N go on to show a number of depressing examples of this and other problems. Probably the most depressing example they give of a proof that doesn't show what you'd like is the Blum Blum Shub Pseudorandom Number Generator, which is based on factoring. There are proofs that provide a lower bound on the adversary's effort (and hence on the security of the algorithm). Unfortunately, when you plug in reasonable sounding parameters (1024 bits) and the current best factoring algorithms you discover that the formula for the lower bound tells us that the attacker's lower bound is a stunning -2199 operations, which isn't that useful. In order to get a reasonable security guarantee, you need to use a 5000 bit modulus and extract only one bit per crank of the generator, which is far too expensive for practical use.

Other topics covered include the security of RSA (including OAEP), ECDSA, and a number of short signature schemes.

1. I'm simplifying here, but it's close enough for the purpose of explanation.
2. This is cryptographer jargon. In networking, protocol means "the bits that go on the wire" but in cryptography, it means something more like "this is the set of cryptographic messages that are sent between A and B". For most simple systems, you can read "protocol" as "algorithm".

UPDATE: Fixed the title of the new paper. Thanks to Perry Metzger for pointing out the error.

 

July 17, 2006

Writing in Slate, Sydney Spiesel points to this really interesting paper by Keith et al. in the International Journal of Obesity. The researchers propose 10 potential reasons for the rise of obesity in the US:
We do not review all plausible contributors to the epidemic but select those that are most interesting and for which the totality of current evidence is strongest. Figure 1 portrays the secular increase in a number of key indicators of these putative causal influences. For most Additional Explanations, we offer the conclusion that a factor (e.g., X) that has contributed to the epidemic will logically follow acceptance of two propositions: (1) X has a causal influence on human adiposity and (2) during the past several decades, the frequency distribution of X has changed such that the relative frequency of values of X leading to higher adiposity levels has increased. In the absence of countervailing forces, if both propositions are true, obesity levels will increase. Therefore, for postulated factors supported by this line of propositional argument (Additional Explanations 1-7), we evaluate evidence addressing whether the factor can increase fatness and whether the factor's frequency distribution has changed in the obesogenic direction. For the remaining Additional Explanations, propositional arguments vary in form and are outlined separately.

The 10 explanations are:

  1. sleep debt
  2. endocrine disruptors (industrial pollution)
  3. reduction in variability in ambient temperatature (air conditioning)
  4. decreased smoking
  5. pharmaceutical iatrogenesis
  6. changes in distribution of ethnicity and age
  7. increasing gravida age
  8. intrauterine and intergenerational effects
  9. greater BMI is associated with greater reproductive fitness yielidng selection for obesity-predisposing genotypes
  10. assortative mating and flor affects

And here's Figure 1:

Explanation 8 (intrauterine and intergenerational effects) is the most intriguing. The authors propose that extreme over and underfeeding in utero may contribute to later life obesity. There's some evidence that this effect can persist across generations, potentially creating a positive feedback loop.

As the authors admit, the evidence is basically correlative, but so is the evidence for the big two explanations (food marketing practice/technology and decreased physical activity). The key question is what the relative size of each effect is, but as the authors point out, we don't really know that yet.

 

July 16, 2006

When in Quebec I drink a fair amount of Unibroue (Lisa informs me it's pronounced Unibrew, not Unibrow). Anyway, I stopped by Trader Joe's today to buy some stuff and was just thinking that maybe they carried Unibroue when what do I find but Trader Joe's Vintage Ale. Big bottle? Check. Cork with wire twise? Check. High alcohol content? Check. "Dark Ale on Lees"? Check. Font looks like Unibroue's? Check. Look, here at the bottom it says it's "brewed by Unibroue". Needless to say, I bought two bottles. It's a very dark beer with a big head, velvety mouth feel, a sort of floral/fruity aftertaste, and that high alcohol kicker. I advise you to buy some. $4.99/bottle (13.5 ml alcohol/dollar).
 

July 14, 2006


BACKGROUND [cribbed from previous postings] 
HTTP has two native authentication mechanisms, Basic Authentication
(passwords in the clear in an HTTP header) and Digest Authentication
(a challenge-response handshake also in the HTTP header). Both
mechanisms depend on the client proving the possession of a shared
secret string (a password in all real cases) to the server. The server
is unauthenticated.

For a variety of reasons, including bad UI on the clients, nearly all
real-world Web applications use neither mechanism but rather use an
HTML form to prompt the user for his password. This password is then
sent in the clear over the HTTP connection.  The standard security
advice for how to secure this data is to use HTTPS (HTTP over SSL/TLS)
to encrypt the entire channel and use the server's certificate to
authenticate the server.  Practical difficulties with checking the
certificate have led to phishing attacks, in which the attacker
gathers the user's password and can then impersonate the user. An
additional problem is that it's hard for Web Services apps (e.g., DAV)
to use this kind of mechanism because HTML may not be involved and so
if it works at all you end up resorting to screen-scraping.

In principle, if you run HTTPS, you can leverage HTTPS's client
authentication mechanisms (client certificates and "pre-shared keys")
to authenticate the client, but in practice this is not done, probably
due to UI reasons and lack of familiarity with the TLS mechanisms
(this applies in particular to PSK).


PROBLEMS PEOPLE ARE TRYING TO SOLVE
Roughly speaking, people seem to be trying to solve three problems:

1. A phishing-resistant authentication mechanism.
2. "Single-signon" mechanisms which let you authenticate to
   multiple services with a single act.
3. A mechanism to let you prove facts about yourself to 
   some service providing party. E.g., I'm over 21.

You can find a long discussion/taxonomy about this here:
http://www.educatedguesswork.org/movabletype/archives/2006/06/notes_on_web_au.html


BOF REPORT
Agenda:
1. There was a modestly long and not-very-productive discussion of
   terminology which sort of led to not much. 

2. Discussion of a bunch of potential problems we might want to
   solve based on the taxonomy at the above link. This resulted
   in adding a few. No real attempt was made to decide which
   were in and out of scope.

3. Sam Hartman's presentation on Webauth and Phishing Requirements

   http://www.ietf.org/internet-drafts/draft-hartman-webauth-phishing-01.txt

   I nitpicked on this a little bit, claiming that mutual authentication
   might or might not be required.
   
4. An attempt to focus on different complexes of things that people are
   trying to solve.

   The eventual list ended up as follows:

1: Fix HTTP authentication, with at least attention to the
   phishing issue.
2: Cross-site authentication (authenticate to site X and then
   leverage that to authentication to site Y).
3. Ability to have a third party assert claims about you to
   relying parties.
4. "Eliot's Dad's problem". Be able to have a single small
   set of credentials that you can use to authenticate to
   a lot of sites.

There seemed to me to be enthusiasm for working on 1,2, and 4
and fear that 3 was too complicated. 

The notion here isn't to form a new WG out of this. Rather, it's
to figure out what kind of BOFs to charter. IMO, none of this
is ready for charter as-is.

 

July 12, 2006

This Wired article talks about the glories of OpenDNS, a DNS caching service. OpenDNS claims to offer three benefits:
  • Performance &em; they maintain a local cache so that name resolution is allegedly faster.
  • Anti-phishing &em; they won't resolve the DNS names of known phishing sites.
  • Typo-correction &em; if you type a non-existent domain name, it will return a search page with potential results.

Let's talk about the performance claim first:

In return, sites like the notoriously sluggish MySpace.com load significantly faster, thanks to the way OpenDNS caches IP addresses.

The background you need here is that DNS is a distributed database. Resolving a name like www.rtfm.com requires going to the root servers, which point you to the com servers, which point you to the rtfm.com servers, which give you the IP address to www.rtfm.com. Typically, your local name server (either operated by your ISP or by your local IT department) does all of this for you (in what's called recursive) more and then caches the result. So, if you point at OpenDNS rather than your local resolver, there's a higher chance that its cache will be already primed with the response so that you can skip the resolution.

How much of a difference does this make? Not much. First, name resolution is typically very fast, on the order of a second or so, which is much faster than your typical web site. I just tried it from my work address and it took .1s. Second, the result is cached, not only in the local nameserver but also in your browser, so you only get lag when you initially go to a Web site, not when you're clicking around inside it. (Cache expiry times vary but we're talking minutes to hours.)

The second issue is anti-phishing. Basically, what OpenDNS is doing is maintaining a blacklist of sites that it thinks are phishing sites. It then refuses to resolve those names. There are already existing anti-phishing blacklist systems such as Microsoft Phishing Filter, Google Safe Browsing, etc. Because these tools run on the client they can take advantage of other cues about phishing and do a better job than a pure blacklist solution (which tend to get out of date). Given that, it's hard to see what OpenDNS's stuff brings to the party

The final argument is typo-correction (reminiscent of Sitefinder. Again, this is something that's easily done at the client side and Firefox, at least, treats some things typed in the title bar as things that should be searched on (I'm not sure I understand the algorithm here&em;it's certainly possible that there's some extension or whatever that already does this). Anyway, it seems like you'd much rather have the search engine of your choice do this, rather than whatever results OpenDNS decides to give you (including their sponsored results). So, it's not clear what the value of this service is either.

 

July 11, 2006

The topic of how long a cryptographic key you should use for various security levels comes up fairly often. Visit www.keylength.com for all your key length needs, including results based on a variety of models.
 

July 10, 2006

Sorry about the comment spam problem. I have remote posting technology but I'm still working on my remote comment spam removal technology.
 
I'm in Montreal for IETF 66. At this point, being able to make cell phone calls is pretty much a requirement for coordination1. I have Sprint and after several phone calls to customer service, I learned (not exactly to my surprise) that roaming in Canada is fiendishly expensive ($1.50/minute). For $3/month, you can get a the Canada Roaming Package, which reduces your per-minute charge to the still exorbitant but much more reasonable $.30. Just so you know.

1.Mobile communications are one of those positive feedback situations. When nobody has cell phones, then people make upfront arrangements (I'll be at home at time XXX, meet you at location YYY at time ZZZ) in order to make subsequent coordination easier. But when people have cell phones, they don't bother to coordinate in advance because they figure they'll be able to do it by cell. But then if you're the guy without a cell phone it's horrifically inconvenient, so there's a lot of incentive to get one.

 

July 7, 2006

Caught Geoffrey Nunberg on NPR on his new book, "Talking Right: How Conservatives Turned Liberalism into a Tax-Raising, Latte-Drinking, Sushi-Eating, Volvo-Driving, New York Times-Reading, Body-Piercing, Hollywood-Loving, Left-Wing Freak Show". I haven't read the book, but a lot of the discussion seemed to be about the relevance, or rather irrelevance of cultural signifiers like eating sushi, driving Volvos, etc.

OK, I can see how reading the Times or having body piercings can be seen as reflective of one's political beliefs. After all, the Times's coverage is liberal, and body-piercing certainly could be seen as a way of signalling that you're part of the counterculture, but sushi and lattes? Obviously, some people don't like sushi, but as Nunberg points out, it's not like whether you like it or not has anything to do with whether you voted Bush.

Of course, there is one very obvious thing that sushi, lattes, and Volvos have in common: they're all foreign. And of course only communistsliberals like anything foreign.

 

July 5, 2006

David Post over at Volokh Conspiracy complains about something that's bugged me for years:
Pretty much every day, in pretty much every newspaper in the country, there is a story that goes something like this [taken from todays WSJ]: "Yesterday, the Dow Jones industrials fell xxx points to yyyy on new concerns about interest rates and anxiety over North Korea's missile tests." Or the Dow rose, due to "increasing optimism about prospects for peace in the Mideast." Or whatever. It's complete and utter nonsense. The market did in fact fall yesterday. But how could anyone possibly know that it was due to "concerns about interest rates," or "anxiety about North Korea's missile program"?

Quite true.

The other thing that really annoys me is that frequently the size of the movement is tiny. If the Dow is up 30 points (.3%) that's well withing the range of random variation. It doesn't need any explanation at all. I guess if the announcers said "the markets didn't really do anything today at all and we wouldn't understand it if they did" people might not be as interested in tuning in.

 

July 4, 2006

Takeru Kobayashi, the Lance Armstrong of professional eating [*]:

Kobayashi ate 53 3/4 hot dogs in 12 minutes to win his 6th straight Nathan's hot dog eating contest. Imagine eating 53 3/4 hot dogs in 12 hours...

 
According to news reports, the DPRK's missile test failed and the space shuttle launch succeeded. It could easily have been the other way around.
 

July 3, 2006

According to Times, CipherTrust reports that clickthrough rates for pornography spam are 280X higher (5.6%) than for pharmaceutical spam (0.02%). Their explanation:
"Successful spam is about impulse purchases," said Francis deSouza, a vice president at Symantec, which makes antivirus software. "Things like home mortgages have a lower success rate than things you'd buy on impulse. Things like Viagra, porn."

Paul Q. Judge, chief technology officer of Ciphertrust, was philosophical. "If you look at some of the oldest and most successful forms of business on earth, they revolve around sex," he said.

I'm sure part of it is impulse control, but it's worth noting that your average porn web site gives you a bunch of free samples before requiring you to fork over money, so there's actually value in clicking through even if you have no intention of buying. By contrast, there's no reason to click through on pharmaceutical spam unless you're planning to buy.

UPDATE: Corrected 280% to 280X.

 

July 2, 2006

Nearly all symmetric encryption algorithms use unstructured keys, i.e., bit strings of a certain length. If we model an encryption algorithm as a function E
C = E(K, M)
Where C is ciphertext, M is the plaintext, and K is the key, then the way you generate K is simply to choose a random bit string of the appropriate length. The interesting thing is that some encryption algorithms have what are called "weak keys" which have special (bad) properties.

For example, DES has four weak keys, which have the property that encryption and decryption are symmetrical. Thus, if have a weak key K_w then

D(K_w, M) = E(K_w, M)

DES also has 6 pairs of semi-weak keys: pairs of keys K_w1 and K_w2 such that:

D(K_w1, M) = E(K_w2, M)

Back in the old days, it was widely believed that you should make an attempt not to use weak keys (i.e., you would check generated keys and reject them if they were weak) but it seems to me that recently opinion has turned towards ignoring the issue. Why?

First, it's not clear how bad choosing a weak key is for a real algorithm. In the case of DES, for example, the weak keys still encrypt the data, they just do it in such a way that if you ever encrypt it again with the same it decrypts it instead. This is a problem in theory, but in practice it's hard to imagine a common scenario in which this would happen, especially when you're using the algorithm in CBC mode. Remember that encrypting the same data twice in CTR mode always decrypts it, so if you want to build a secure system that's algorithm-agile you have to be careful about this anyway.

Second, the chance of randomly1 choosing a weak key in DES is very low. Even if you count semi-weak keys as weak, the chance of choosing one is about 1/252. What do you figure the chance that something will go wrong with your hardware or software and you'll fail to encrypt the data entirely is?

But let's ignore these issues and consider a case where weak keys were really weak, like they turn the encryption algorithm into the identity function. Now consider things from the perspective of an attacker. They're given some "ciphertext" X and want to try to decode it. To mount a brute force known plaintext attack, we do:

  1. For each potential key K_i
  2. Compute M'=D(K_i,X)
  3. If M' looks like a plausible message then exit.
  4. else try the next key.

If the cost to do a decryption is d, the cost to see if a message is plausible is p, and there are k key, then the average running time of this algorithm is kdp/2.

Now, if there are w weak keys, then the attacker does a somewhat different algorithm:

  1. Check to see if the message is plausible as-is.
  2. For each potential key K_i
  3. Compute M'=D(K_i,X)
  4. If M' looks like a plausible message then exit.
  5. else try the next key.
The running time of this algorithm is p+(k-w)pd/2, so the advantage to the attacker is (approximately) wpd/2, so it starts to look like you should avoid weak keys. But consider that if you do that--and we assume that the attacker knows how your implementation behaves--then he can run the original algorithm just skipping the weak keys, at cost: (k-w)pd/2, or slightly less than the cost if the message sender just ignores the weak keys. The problem here is that it's the existence of weak keys, not their use by message senders, that reduces the key space of the algorithm.

Now, in the particular case where the weak keys turn encryption into a no-op, you do have to worry about automatic traffic sniffers designed to data mine ordinary unencrypted traffic. Such packet sniffers would of course automatically pick up any "encrypted" data processed with weak keys. However, as a matter of practice, this doesn't happen with real algorithms. First, as previously noted DES weak keys aren't the identity function. Second, the identity property would only apply in ECB, not with CBC or CTR. So, some processing would be required (even if it's much faster than trying a new key) prior to checking whether the plaintext were plausible, which brings us back to the analysis before, with a new term for the "fast" processing. Once again discarding weak keys makes the attacker's job very slightly easier.

I should emphasize that this issue is mostly theoretical. No popular algorithm has a significant enough number of really weak keys to give you a significant chance of generating one by accident. But that's all the more reason not to incur the implementation complexity of bothering to reject them.2

1. Though note that all zeroes and all ones are both weak keys. One might imagine that if your PRNG failed you would get one of these. So, as a special case you might want to check for these to make sure your system is still functioning. On the other hand, there's always the chance that the check itself will screw something up...
2. The analysis here is about encryption. As this Wikipedia article points out, the situation is different in MAC modes, but standard practice these days is to use hash-based MACs rather than encryption algorithm based MACs--though of course those might have weak keys too.

 
The Internet is a packet-oriented network at many layers. In compliance with the Sapir-Whorf hypothesis and in order to maximize minimize confusion, each protocol tends to invent its own name for chunks at its layer of abstraction. Here are some:
NameProtocol
CellATM
FrameEthernet
PDULots of ASN.1 protocols
PacketIP, PGP, RTP
Datagram1IP, UDP
SegmentTCP
RecordSSL/TLS
MessageRF822 e-mail, IKE, HTTP
StanzaJabber/XMPP
ChunkMSRP
BIUSNA

I've certainly heard arguments that say TCP segments are somehow different from packets (as in IP) because packets are self-contained whereas TCP segments, SSL records, etc. are part of an ongoing transaction. Unfortunately, this distinction as far as naming doesn't really hold up under common usage. For instance, RTP "packets" are part of a stream.

Any other names I haven't thought of?

1. Chris Walsh 2. Terence Spies

 

July 1, 2006

I've reviewed the two NEA drafts (draft-thomson-nea-problem-statement-03 
and draft-khosravi-nea-requirements-01) and have some high-level 
concerns, discussed below.