« June 2005 | Main | August 2005 »

July 31, 2005

Data from the great wine taste test

A common topic of conversation among my friends, several of whom are wine snobs and several of them (myself included) is the extent to which wine quality really varies by price and general snop appeal. At a recent party at our house, Cullen Jennings decided to run an experiment in blind tasting. He bought three pairs of wine, one cheap, one expensive, and encouraged people to try to determine which was which.

Cullen Jenning's message describing the raw results, without any real analysis, is below.

Red is 1
Green is 2
Orange is 3

In order of price

(These were under $15)
1B - Castoro Cellars - 2002 Paso Robles
2B - Alexander Valley Vineyards - 2002 - Alexander Valley
3A - Chateau Souverain - 2001 - Alexander Valley

(These were over $50)
2A - Januik, champoux vineyard - 2002 - Columbia Valley
1a - Ridge - 2001 - Santa Cruz Mountains
3b - Chimney Rock, Elvage, 2001 - Stags Leap, Napa

The correct answer for the sets is "a,a,b"

The raw data is:

Participant    Set#1    Set#2    Set#3
31    a    b    a
40    a    a    b
22    b    a    b
30              b
20    a    b    a
21    a    b    b
40    a    a    b
18    a    b    b
35    a    a    a
38    b    b    a
28    b        
36    a    b    a
19    a    b    a
34    a    b    a
4     a    b    a
39    b    a    b
29    a    a    
25    a    a    b
23    a    b    b
78% of people correctly selected Set #1. Most people said this one was easier than the second two. In the second set, 41% got it right while the third 53% got it right.

Interestingly, of the the people that got the second set right, 83% of them also got the third set correct.

Posted by ekr at 11:05 PM | Comments (2) | TrackBack

Comments on draft-allman-dkim-base-00.txt

From my comments on the ietf-mailsig mailing list. Below the fold.
1.0 Background
DomainKeys Identified Mail (DKIM) is basically an anti-forgery
measure. The idea is that MTAs serving a given domain will
digitally sign messages they forward. Thus, recipients will
be able to detect forgery by absence of a valid signature.
Signing keys are made available via DNS.


2.0 General Comments

2.1 Lack of Complete Motivation

The context in which DKIM (and other similar proposals) is introduced
is that of spam/phishing prevention. Much (most?) spam is forged
and therefore it's often believed that stopping forgery is a 
first step to stopping spam/phishing. This is a contentious
argument with points on both sides, but this document doesn't really
address it at all. It just hand-waves in that direction in the 
introduction:
   
   The ultimate goal of this framework is to prove
   and protect message sender identity and the integrity of the messages
   they convey while retaining the functionality of Internet email as it
   is known today.  Proof and protection of email identity, including
   repudiation and non-repudiation, may assist in the global control of
   "spam" and "phishing".

In my opinion, a substantial new piece of work like this needs 
to start with a threat analysis of the problem and an attempt 
to determine whether the proposed solution is likely to actually
help. I don't see any of this here. I appreciate that the authors
don't want to get drawn into an extended debate, but I'm also
quite uncomfortable with starting a major new effort without 
having some level of comfort that it will actually work.


2.2 Duplication of Existing Mechanisms

DKIM really consists of two loosely coupled protocols.

     (1) A message signing protocol with characteristics somewhat
	 reminiscent of PEM.  
     (2) A key retrieval protocol to retrieve the keys used
	 for (1), as well as to deliver policy about what
	 messages are expected to be signed.

There's no reason why these two need to be coupled at all.
Indeed, it would make a lot of sense to couple a generic
message signature protocol to the key retrieval mechanism
described here.


2.2.1 Message Signing Protocol

At the time of this writing, the IETF has RFCs documenting no
less than four protocols for cryptographically signing e-mail
messages: PEM (RFC1421-1424), MOSS (RFC 1848), S/MIME (RFC 3850-3853),
and OpenPGP (RFC 2440, RFC 3156). All of these were designed to
solve the same basic problem: cryptographically protecting
a message so that it could survive transport via as many
MTAs as possible and still be readable (and verifiable) on
the other end.

DKIM introduces yet another such protocol, one which is conspicuously
less elegant than the currently dominant Security/Multiparts
approach (RFC 1847). What motivation for this choice there is
is in S 1.1:

      the message signature is written to the message header fields so
      that neither human recipients nor existing MUA software are
      confused by signature-related content appearing in the message
      body

As it happens, this was the design that PEM used but was discarded for
S/MIME. As I recall, part of the motivation was the concern at the
time was that the headers would be mangled or removed by MTAs.

I'm not enough of a mail expert to have an informed opinion
on which packaging is superior from the perspective of compatibility,
however it seems to me that the compatibility requirements for 
{S/MIME,OpenPGP} and this application are essentially similar.
If the approach described in DKIM is superior, then shouldn't
we be retargeting S/MIME and OpenPGP to use it? And if not, then
shouldn't DKIM be using S/MIME or OpenPGP?

2.3 General Complexity
The interaction of this design with all the mail headers seems
extremely complex and easy to screw up. Maybe that's necessary
and I don't have an easy way to simplify it, but it's worth
noting.

3.0 Detailed Comments

Below, quotes are set off by -----
S 1.1
------
   The approach taken by DKIM differs from previous approaches to
   message signing (e.g.  S/MIME, OpenPGP) in that:

   o  the message signature is written to the message header fields so
      that neither human recipients nor existing MUA software are
      confused by signature-related content appearing in the message
      body

   o  there is no dependency on public and private key pairs being
      issued by well-known, trusted certificate authorities

   o  there is no dependency on the deployment of any new Internet
      protocols or services for public key distribution or revocation.
-----

The first point does not apply to PEM (RFC 1421). The second and
third points are sort of misleading since OpenPGP and S/MIME 
can work perfectly well with e.g., self-signed certificates. 
And with the same level of effort required to distribute DKIM
keys you could build key distribution systems for S/MIME and
OpenPGP with similar levels of assurance.


-----
   o  does not require the use of a trusted third party (such as a
      certificate authority or other entity) which might impose
      significant costs or introduce delays to deployment

   o  can be deployed incrementally.
-----

I'm not convinced that this *can* be deployed incrementally.
The key here is to focus on the amount of information provided
by a message being unsigned. If most of the internet is sending
unsigned messages, then the negative weight you can assign to 
someone not being signed is necessarily very small and so it's
not very useful for spam triage. This is exactly the kind of
question which needs to be addressed in a security analysis of this
class of system.


S 1.4:
-----
   DKIM differs from traditional hierarchical public-key systems in that
   no key signing infrastructure is required; the verifier requests the
   public key from the claimed signer directly.
-----
I wouldn't characterize a DNS query to a server quite possibly not
controlled by the same person as the MTA as "directly". For instance,
I operate my own MTA but name service is provided by my ISP.


S 3.3.1:
------
   The rsa-sha1 Signing Algorithm computes a SHA-1 hash of the message
   header field and body as described in section XINDX below.  That hash
   is then encrypted by the signer using the RSA algorithm and the
   signer's private key.  The hash MUST NOT be truncated or converted
   into any form other than the native binary form before being signed.

   More formally, the algorithm for the signature using rsa-sha1 is:


   RSA(SHA1(canon_message, DKIM-SIG), key)
------
It would be great if people would stop using the word "encrypted" in
the context of RSA signatures. Yes, it's true that you're taking
the plaintext (well hash) and doing M^d and that that's the same
operation as the RSA encryption operation, but it's not encrption
in any real sense because everyone can decrypt it. Moroever, it
confuses the issue because DSA signing is totally different.

Also, "RSA" isn't a single function because of the padding issue.
You need to specify PKCS#1 something or other. There's a normative
ref to RFC 3447 but nothing in the text.

Also, SHA-1 doesn't take two arguments. I assume you mean concatenation,
in which case I recommend:

   RSA(SHA1(canon_message || DKIM-SIG), key)


S 3.3.3:
-----
   o  Larger keys impose higher CPU costs to verify and sign email
-----

The CPU cost to verify even fairly large RSA signatures is extremely
small.


S 3.4:
------
      algorithms.  To avoid this attack, signers should be wary of using
      this tag, and verifiers might wish to ignore the tag or remove
      text that appears after the specified content length.
------
If verifiers ignore the tag, the signature will fail to verify,
right? That doesn't seem like what we want.


S 3.5:
------
         INFORMATIVE RATIONALE:  The authors understand that SHA-1 has
         been theoretically compromised.  However, viable attacks
         require the attacker to choose both sets of input text; given a
         preexisting input (a "preimaging" attack), it is still hard to
         determine another input that produces an SHA-1 collision, and
         the chance that such input would be of value to an attacker is
         minimal.  Also, there is broad library for SHA-1, whereas
         alternatives such as SHA-256 are just emerging.  Finally, DKIM
         is not intended to have legal- or military-grade requirements.
         There is nothing inherent in using SHA-1 here other than
         implementer convenience.  See
          for
         a discussion of the security issues.
-------
I'm not sure that I agree with this analysis. If the model for the
MTA is that it's going to blindly sign any message from someone
who is authorized, then I agree with you, but that's not the
only model. Consider an MTA which does some outgoing spam filtering
and only signs messages it thinks aren't spam--this is to contain
botnet compromise. 

In this case, the attacker prepares two colliding messages, one
spam and one innocuous. He gets the MTA to sign the innocuous one
and then substitutes on output. Note that this is possible because
of the extension property of SHA-1 and the choice to put the message
contents first in the SHA-1.

S 4:
-----
   Further confusion could occur with multiple signatures added at the
   same logical "depth".  For example, a signer could choose to sign
   using different signing or canonicalization algorithms.  However,
   even this is problematic because some of those signatures will
   inevitably have to sign some of the others (and at very minimum must
   be presented to the verification algorithm in the same order as
   presented to the signature algorithm).
-----
Note that this is a result of the particular signature formatting
choices used by the designers. S/MIME handles parallel signatures
with no problem.

S 5.2.2:
-----
      INFORMATIVE ADMONITION:  Despite the fact that [RFC2822] permits
      header field blocks to be reordered (with the exception of
      Received header fields), reordering of signed replicated header
      fields by intermediate MTAs will cause DKIM signatures to be
      broken; such anti-social behavior should be avoided.
-----
How likely is this to happen? If current MTAs behave this way, 
then that rather weakens the claim that DKIM is incrementally
deployable.


S 6.4:
-----
   o  Based on the algorithm indicated in the "a=" tag,

      *  Compute the message hash from the canonical copy as described
         in section XINDX.  Note that this requires presenting the
         "nowsp" canonicalized DKIM-Signature header field to the hash
         algorithm after the body of the message, and with the "b="
         value treated as the empty string.

      *  Decrypt the signature using the signer's public key.

   o  Compare the decrypted signature to the message hash.

      INFORMATIVE IMPLEMENTER'S NOTE:  Implementations might wish to
      initiate the public-key query in parallel with calculating the
      hash as the public key is not needed until the final decryption is
      calculated.
-----
Again, please avoid the terminology of digital signature verification
as decryption.

S 9.3:
-----
   Since the key servers are distributed (potentially separate for each
   domain), the number of servers that would need to be attacked to
   defeat this mechanism on an Internet-wide basis is very large.
   Nevertheless, key servers for individual domains could be attacked,
   impeding the verification of messages from that domain.  This is not
   significantly different from the ability of an attacker to deny
   service to the mail exchangers for a given domain, although it
   affects outgoing, not incoming, mail.
-----
I think this analysis misses something important: DKIM depends on
wide deployment of signing infrastructure in order to make 
a message being not signed meaningful. If popular senders
are regularly unverifiable, this strongly reduces recipients
incentive to make decisions based on DKIM settings.


S 9.4:
------
   To systematically thwart the intent of DKIM, an attacker must conduct
   a very costly and very extensive attack on many parts of the DNS over
   an extended period.  No one knows for sure how attackers will
   respond, however the cost/benefit of conducting prolonged DNS attacks
   of this nature is expected to be uneconomical.
-----
Wait, isn't this exactly the kind of attack pharmers mount?


S 9.5:
Wouldn't one defense against this be to match the To: lines?

Posted by ekr at 6:06 AM | Comments (6) | TrackBack

July 30, 2005

Back on the air

I'm in Paris for IETF 63 and this is the first time I've had Internet connectivity since I got here, hence the light blogging. Things should resume tomorrow.

For now, a few superficial observations about Paris:

In other news, the IESG has approved TLS 1.1 (RFC 2246bis) a minor update to TLS.

Posted by ekr at 9:38 AM | Comments (4) | TrackBack

July 28, 2005

Notes to self

  1. Do not attempt to update the operating system for you laptop the day before you leave on a trip.
  2. The Panasonic W-2 needs a custom kernel in order to work with the Wavelan wireless card. Forgetting this and building with GENERIC leads to failure to connect to the network and likely panic attacks.

Posted by ekr at 8:04 AM | Comments (2) | TrackBack

July 27, 2005

Who should pay for your identity theft protection?

Bruce Schneier writes:
Wells Fargo is profiting because its customers are afraid of identity theft:

The San Francisco bank, in conjunction with marketing behemoth Trilegiant, is offering a new service called Wells Fargo Select Identity Theft Protection. [here--EKR] For $12.99 a month, this includes daily monitoring of one's credit files and assistance in dealing with cases of fraud.
It's reprehensible that Wells Fargo doesn't offer this service for free.

Actually, that's not true. It's smart business for Wells Fargo to charge for this service. It's reprehensible that the regulatory landscape is such that Wells Fargo does not feel it's in its best interest to offer this service for free. Wells Fargo is a for-profit enterprise, and they react to the realities of the market. We need those realities to better serve the people.

I've been doing some thinking about what kind of regulatory regime would make sense. The following are some preliminary, partly thought out notes on the topic.

To a first order, there are four kinds of identity theft to be concerned with here:

  1. Where your information is stolen from another vendor and used to defraud you at another vendor and WF isn't involved at all.
  2. Your information is stolen from some other vendor and used to defraud you at WF (e.g., to open a new WF account or suck money out of yours), but WF is following their normal (admittedly, inadequate) authentication procedures.
  3. Where your information is stolen from WF and used to defraud you at WF.
  4. Where your information is stolen from WF and used to defraud you at another vendor.

As far as I can tell, Select Identity Theft is designed to help you deal with all of these (note: I'm not offering an opinion about how well it actually works.)

It seems pretty clear that WF isn't at fault in case (1).

It's arguable that they're not at fault in case (2) either. After all, WF uses the same identity information to authenticate you as everyone else, so if someone steals that information from (say) BoFA, then you're pretty much hosed. This is especially true if you don't have any accounts with WF the attacker is opening a new one since WF has a pretty limited repertoire of ways to authenticate you at that point. Now, it's arguable that WF should do a better job of confirming that it's really me who wants a credit card, but it's hard to see how this could be ameliorated by offering me a free anti-identity-theft service if we don't have any prior relationship. How would they even provide such a service?1

Now, in cases (3) and (4), WF could certainly offer me this service for free. But what regulatory incentives would cause them to want to? It seems to me that there are four basic regulatory responses (aside from simply mandating this service be offered):

  1. When a vendor/institution is responsible for letting your data leak they get punished.
  2. When a vendor/institution is responsible for letting your data leak they are liable for your costs--or at least their punishment scales with your losses.
  3. When a vendor/institution is defrauded by identity theft (i.e., someone who got the data somewhere else) they get punished.
  4. When a vendor/institution is defrauded by identity theft (i.e., someone who got the data somewhere else) they can't come after you for the money.

The current regulatory regime is some approximate combination of (1) and (4). But neither of these offers WF any incentive to offer this kind of global anti-fraud program, which is focused on compromise containment for their customers (i.e., cases (3) and (4)). Similarly, rule (3) doesn't offer WF any incentives, except in case (4). They certainly wish that other financial institutions would offer anti-fraud programs, but offering their own anti-fraud program wouldn't help because it's not their current customers that are being defrauded but new customers (and offering an anti-fraud program to the fraudsters doesn't make much sense.)

That leaves us with rule (2), which I'm guessing is the kind of thing that Bruce is thinking of. In this case, WF certainly does have an incentive to contain compromise of their customer's data, and so some incentive to offer you this service. However, it's hard to get the incentive level right. In general only a fraction of an institution's customers will have their information compromised, so the advantage to WF of giving free protection to any customer in case he might be compromised in the future is fairly small. That's not a big deal if what you're offering is insurance, since most of the cost of that is the payoff. However, if there's a substantial cost to just running the program even if your users don't have their data compromised, then the situation is a little different and it's unlikely to be efficient for the institution to offer free protection.

A related problem is that it's hard to determine responsibility. Since there's so much data leakage going on, there's a real chance that my data will be leaked multiple times. If that happens and then I'm the victim of fraud, who pays off? The obvious thing to do here is to split the penalty between all of the institutions who let your data leak, rather than trying to figure out which leak was responsible--something that likely requires too much investigation. Even this sort of penalty requires a fair amount of effort to impose, since we need to match up leaks with victims.

Of course, this sort of splitting has an obvious collective action problem: say it costs $10/month in aggregate to provide this kind of service for a customer. Even if it's worth $10/month in fines to the financial institutions in aggregate, once it's split over the number of institutions I have accounts with, it may not be worth it for any individual institution to pay for protection. On the other hand, if we make each institution bear the full cost, we get an inefficiently large amount of protection. By contrast, if I'm contracting for this service myself, I know how much it's worth and there's no collective action problem. I'm not sure that there's a regulatory regime that produces an equally efficient allocation of effort of this type.

Note that this argument doesn't apply as much to the provision of system security for my data, as opposed to monitoring after its stolen, since only the institutions can secure my data. Moreover, we can get past the collective action problems by fining the institutions the expected value of the loss, without worrying about the impact of compromise containment measures.

1. Note that WF could make it harder for me to open a second account once I have a first one, e.g., by having some private authenticator. That would probably be useful.

Posted by ekr at 10:59 PM | Comments (5) | TrackBack

July 26, 2005

Use a fake domain name get 10 years

The Children's Safety Act (introduced in the House) would extend the penalty for "using misleading domain names to direct children to harmful material on the internet" to between 10 and 30 years. [click and then go to Title IV) I understand that everyone wants to look tough on pornography, but we're going to lock someone up for 10 years for using a misleading domain name? For reference, the penalty for rape in California is between 3 and 8 years ( California Penal Code S261(a)

UPDATE: Replaced specific permalinks which don't work with annoying general permalinks which do. Thanks, legislative search interfaces!

Posted by ekr at 10:03 PM | Comments (2) | TrackBack

July 25, 2005

The Tenure Track

Donald Heller proposes a hot new reality show:
Everybody has loved watching the competition to see which of those spunky little 18 year-olds on The Scholar is going to receive the scholarship. But those kids are so bright and overachieving that the audience knows that all of them, not just the winner, will end up going to college somewhere. But think about how much more interesting the competition will be as graduate students battle it out for the holy grail of American higher education: a tenure-track faculty position! With so few graduating Ph.D.s landing one of these babies, the competition in this reality show will make Survivor look like a walk in the park.

Heller goes on to describe the various hoops that students would be made to jump through on his show. If you're an academic it will probably sound pretty familiar.

Posted by ekr at 10:14 PM | TrackBack

July 24, 2005

Back of the envelope cost/benefit analysis of bag searches

Let's try to do the cost/benefit analysis of the NYC subway search program. Say that you can search one passenger a minute. That seems like a reasonable compromise between a quick look in the bag (10 s) and a TSA-style dump everything out and check for explosives search (5 minutes). So, on a given working day, a single officer can check about 480 people. (At a commensurate cost to whatever other law enforcement activities they're engaging in, though since presumably a lot of that time is spent just being present and deterring crime, that's probably only partially lost.)

So, how well does this actually work? On a given day there are about 5 million passengers riding the NY subway. We're told that the Transit Bureau has about 3,000 officers, so in principle, then, they can check anywhere between zero and 33% of the subway passengers. So, what's the marginal value of tasking another officer to search passengers?

If searches are conducted randomly, then on average each additional officer devoted to searching will increase the chance of detecting a terrorist by 480/5 million, or about 10-4. If we assume charitably that when terrorists are detected they just walk away rather than blowing themselves up (a line of people waiting to be searched makes a rather nice target). If we assume that about 100 people a year will be killed by subway bombings (to date nobody has been killed in NYC subway bombings and the London bombing only killed 56 people), then each police offer saves .01 lives, at a cost per life of about $10,000,000, near the top end of the standard estimates for statistical value of a human life.

Of course, this depends on some pretty charitable assumptions, namely:

  1. The rate of attempted attacks will be substantially higher
  2. People don't blow themselves up when detected.
  3. That people who are detected don't just come back later try again.
  4. That you can do a reasonable search in a minute. The TSA secondary screenings I've been on seem to take more like five.
  5. That the terrorists won't shift to some new target. Railway stations are good, but so are airports (outside the security perimeter), shopping malls, etc.

I'm not sure I believe any of these. And if you do, then you should be in favor of searching nearly everyone, since the cost/benefit numbers look the same for nearly any search fraction up to the point where the probability of a successful attack is so low that the terrorists just choose another target. I'm not sure exactly where this point is, but I expect you need to get above 25%. I'm extremely skeptical that a percent or so will get the job done.

Posted by ekr at 9:15 PM | Comments (9) | TrackBack

July 23, 2005

And on the other side...

Driving North on US 101 around Woodside Road there's a big billboard advertising Michael Savage's radio show.

Driving South on US 101 around Woodside Road there's a big billboard advertising Al Franken's radio show.

Last night I realized they're on opposite sides of the same gantry.

Posted by ekr at 8:13 AM | TrackBack

July 22, 2005

Curse you JNI

When you call System.loadLibrary("foo") Java tries to load libfoo.so. That's actually sensible behavior, but unfortunately the error message doesn't tell you what file it's trying to load. It just says:
Exception in thread "main" java.lang.UnsatisfiedLinkError: 
no foo in java.library.path
Which doesn't actually help that much if you've forgotten to put "lib" in front of your file name.

Posted by ekr at 6:33 PM | Comments (1) | TrackBack

July 21, 2005

Welcome to being searched on the subway

The New York Police are going to start "randomly" searching people's bags on the subway. [*].
At some of the busiest of the city's 468 stations, riders will be asked to open their bags for a visual check before they go through the turnstiles. Those who refuse will not be permitted to bring the package into the subway but will be able to leave the station without further questioning, officials said.

Police Commissioner Raymond W. Kelly promised "a systematized approach" in the searches and said the basis for selecting riders for the checks would not be race, ethnicity or religion. The New York Civil Liberties Union questioned the legality of the searches, however, and Mr. Kelly said department lawyers were researching the legal implications.

"Every certain number of people will be checked," Mr. Kelly said. "We'll give some very specific and detailed instructions to our officers as to how to do this in accordance with the law and the Constitution."

...

Mr. Browne, the police spokesman, said, "Obviously we're going to use common sense for someone that appears to be an imminent threat." For example, he said, if a passenger with a large package had both fists clenched, police officers would be justified in searching him. Anyone found to be holding illegal drugs or weapons is subject to arrest, he said.

A few observations here:

Posted by ekr at 9:59 PM | Comments (5) | TrackBack

July 20, 2005

New hash function: VSH

With the compromise of MD5 and the recent attacks on SHA-1, we should expect to see quite a bit of activity in hash functions in the next few years. As with the AES competition, expect to see a lot of new hash functions that look quite a bit like the old hash functions. Also as with AES, that's a little distressing since our methods for assuring ourselves that these algorithms are secure aren't very satisfactory (as recent events have shown).

As I've mentioned before, the cryptographic community has pretty much decided that new asymmetric (public key) algorithms must be provably reducible to some hard (or at least believed hard) problem. Unfortunately, these algorithms tend to be much slower than the bit mashing algorithms that dominate symmetric cryptography. All this is to provide background for Contini, Lenstra, and Steinfeld's Very Smooth Hash (VSH), a number theoretic hash function for which the authors claim collision-resistance is reducible to factoring. The variant that provides security equivalent to 1024-bit RSA is about 25x slower than SHA-1.

Posted by ekr at 6:50 AM | Comments (5) | TrackBack

July 19, 2005

Why VoIP over TCP and/or SSL sounds like crap (I)

The difficulty of using IPsec VPNs has made SSL-based VPNs an increasingly popular networking technology. If your enterprise is using VoIP (isn't everyone?), then it's natural to want to carry that traffic over the SSL VPN. Unfortunately, this doesn't work very well.

The source of the problem is that SSL/TLS runs over TCP. TCP is designed to provide a channel with a number of properties:

What this implies is that TCP views data as a single long stream of data. It's convenient to think of the data as being a series of bytes numbered 1-N. In order to transmit it, the data is broken up into a sequence of packets, each with its own sequence number. Those packets are independently transmitted over the network. On the receiving side, these packets are reassembled into a stream and delivered to the application (and hence to your ear) as soon as they're available.

Here's the simplest example:

TimeSentReceivedDelivered
11-5
2 1-51-5
36-10
4 6-106-10

At time 1, the sender transmits a single packet containing bytes 1-5. At time 2, it's received by the receiver, who passes it on to the application. At time 3, the sender transmits another packet containing 6-10. At time 4, the receiver receives that packet, and delivers it to the application. Data is delivered to the application as soon as its available and (here's the key point) in order. Consider the next case:

TimeSentReceivedDelivered
11-5
26-10
3 6-10
4 1-51-10

In this case, the sender sends two separate packets, one containing 1-5 and the other 6-10. They're sent in order but received out of order. At time 3, the receiver received the packet containing 6-10. However, since it hasn't received 1-5, this packet is out of order so it doesn't deliver it. Rather, it waits until it receives bytes 1-5 at time 4 and then delivers all the bytes together. This is the "in-order" feature. Note that TCP doesn't preserve byte boundaries: the application can't tell whether the data was transferred as one packet or ten or what order things were received in. This is the stream-oriented feature.

Remember that I said that TCP was reliable. Packet networks are fairly unreliable; packets can get damaged, lost, or rerouted. TCP imposes a reliable abstraction over top. The way that this works is that the receiver sends Acknowledgements indicating which packets it has received. An example is shown in the figure below:

In this figure, the sender sends two packets in sequence, one containing bytes 1-5 and one containing bytes 6-10. The receiver responds with an acknowledgement that it's received bytes up to byte 10. One important thing to notice is that the sender doesn't send bytes 11-15 until he gets the ACK. This illustrates another important feature of TCP: flow control. TCP uses the ACKs from the recipient to control the flow of data from the sender. When the network gets congested, packets start getting dropped, the sender stops getting ACKs as fast and responds by reducing the sending rate. This responsiveness to network conditions is a key part of TCP.

If the recipient doesn't acknowledge a packet the sender retransmits it. This looks something like this:

In this example, the sender sends the same packets as in the previous figure but the first one gets lost. What the receiver sees is just the second packet containing bytes 6-10. It can't deliver these since the first packet is missing, so it waits for the sender to retransmit (Note to nerds: I'm assuming that selective ACK isn't in use here). After a while (typically a second or so) the sender notices that it hasn't received an ACK and retransmits both packets. When the receiver sees the retransmitted packets, it acknowledges them. This retransmission and acknowledgement function is what makes TCP reliable--the sender keeps trying to send the data until it gets an ACK or it concludes that the network is fatally broken and terminates the connection. Note that it's now seen two copies of bytes 6-10, but that's not a problem to interpret. At the same time as the receiver sends the ACK, it delivers the completed bytes 1-10 to the application.

We're now ready to see how these features interact with VoIP. Voice traffic consists of a series of samples taken at regular intervals, for instance every 20 milliseconds. If each sample is 20 bytes, this gives you a sequence of 20 byte packets at times 0 ms, 20 ms, 40 ms, 60 ms, etc. In order for the voice to sound the same on the receiving end as it did on the sending end, these samples need to be played at the same intervals. There's some propagation delay here but you still need to play at the same rate. So, if the propagation delay was 50 ms you'd get something like this:

Sample #Time SampledTime Played
1050
22070
34090
460110
580130

Now, consider what happens if sample 3 is lost in transmission. Ordinary VoIP systems use UDP, in which the packets are independent and are delivered as soon as they are received, no matter what order they are in. So, what happens is that the receiving application sees packets 1 and 2, a 20 millisecond blank spot, and then packets 4 and 5 (I'm oversimplifying here, since the timing isn't that precise, but this is the general idea.) Now, the receiver doesn't have sample 3, but it's still got sample 4 scheduled for 110 ms. There are three basic stragies for dealing with this:

  1. Plays 20 ms of silence in place of the dropped sample 3.
  2. Try to guess what would have been there by some form of nterpolation/extrapolation.
  3. Repeat the last sample.
None of these options sound perfect but they're basically ok as long as not too many samples are lost. The standard procedure appears to be (3) replay the last sample. It's easy and has about the right spectral properties to not sound too awful.

The problem is that this doesn't work with TCP. Instead, what happens is that when sample 3 is lost, the TCP implementation sits on samples 4 and 5 until it receives sample 3. This means that it's waiting for the sender to retransmit that sample. As we discussed before, this takes on the order of a second. During that time period, the receiver has no real choice but to play silence, so this is perceived as a dropout.

Once the retransmission happens, the receiver needs to try to recover. If all has gone well, the sender has sent not only sample 3, but most of the samples that would have fit in the next second or so. At this point the sender and receiver are synchronized from a network perspective, but the speaker on the receiver's computer is hopelessly behind. The usual procedure is just to start playing the sound where you would have been if the loss and retransmit had never occurred, so it just sounds like a 1 second dropout. This gives something like this:

Time (ms)Sample played
501
702
90-1090Silence
111052
113053

Obviously, if there's any reasonable rate of packet loss, this starts to sound pretty terrible. But things can get even worse. Remember the flow control feature of TCP? If enough packets get lost, then when the sender retransmits, he'll have a big backlog of untransmitted samples. This takes a while to work through and either the listener gets delayed audio (which sounds really weird) or has to endure a multi-second dropout, which is basically intolerable.

At this point it's worth asking why streaming audio doesn't sound terrible even though it runs over TCP. The reason is that the recipients buffer seconds to minutes worth of audio before they start playing. That way, if there's a packet loss, they just keep right on playing out of the buffer with no interruption. If there's a big enough network problem, you can empty the buffer and that's why you'll sometimes see streaming audio or video pause, but in general this strategy works fine if you have a big enough buffer. Unfortunately, you can't use this strategy for voice because it's interactive. It would be fairly intolerable to have to wait 10 seconds after you've said something before you started hearing the other person's reply.

Planned future posts in this series:

  1. Why congestion control makes the problem worse
  2. Why ACK spoofing is a bad idea.
  3. Why you shouldn't use multiple TCP connections to reduce delay for VoIP

Acknowledgement: Thanks to Cullen Jennings for review and helpful suggestions.

Posted by ekr at 9:51 AM | Comments (6) | TrackBack

July 18, 2005

Digital signatures for DVDs

Ed Felten points out that HD-DVD players will only play disks that are signed by some authorized manufacturer:
The technical are in the AACS Pre-recorded Video Book Specification. The digital imprimatur is called a "content certificate" (see p. 5 for overview), and is created "at a secure facility operated by [the AACS organization]" (p. 8 ). It is forbidden to publish any work without an imprimatur, and player devices are forbidden to play any work that lacks an imprimatur.

Like the original imprimatur, the AACS one can be revoked retroactively. AACS calls this "content revocation". Every disc that is manufactured is required to carry an up-to-date list of revoked works. Player devices are required to keep track of which works have been revoked, and to refuse to play revoked works.

The AACS documents avoid giving a rationale for this feature. The closest they come to a rationale is a statement that the system was designed so that "[c]ompliant players can authenticate that content came from an authorized, licensed replicator" (p. 1). But the system as described does not seem designed for that goal — if it were, the disc would be signed (and the signature possibly revoked) by the replicator, not by the central AACS organization. Also, the actual design replaces "can authenticate" by "must authenticate, and must refuse to play if authentication fails".

It seems to me that there are two basic rationales for this. The first is to allow the AACS to charge rents for the privilege of recording DVDs. This lines up nicely with their encryption scheme which is designed at least partially to extract rents from the electronics manufacturers for the privilege of manufacturing DVD players.

But there's another likely reason and it's about copy protection. DVDs (and especially HD-DVDs) are too large to efficiently transfer over the Internet as-is. Most file sharing is of compressed files, which are, naturally, much smaller. Now, these compressed files won't be signed and because the compression is lossy you won't be able to reconstruct a version that matches the signature. So, at least one purpose of the signature is to stop you from playing your Internet file-shared content on a convenient platform. It won't stop you from plugging your computer into your TV, of course, but that's a much bigger pain than just burning a DVD and popping it into your standard DVD player, which you can do now.

Another attractive feature of this design (at least from the perspective of the AACS) is that it's a lot less brittle than encryption-based schemes. All of the encryption schemes are susceptible to someone extracting the keys from their DVD player. But with signature systems, the players only have the public key so cracking them doesn't do you any good. Sure, you might be able to bypass the signature checking on your own player, but that's not really something that scales that well.

Of course, the obvious downside of this strategy is that it makes it a big pain for people like you and may to burn our own DVDs. But I wouldn't be surprised if there was a solution for that. Notice that Intel and Microsoft are part of the AACS consortium. A trusted computing module would be a very convenient way to let users sign their DVDs, subject to whatever DRM restrictions are built into the platform.

Posted by ekr at 8:04 AM | Comments (6) | TrackBack

July 17, 2005

More on Disney

Over on Interesting-People, Bill Rogers claims that being biometrically scanned at Disney World is optional:
1. Disney does not need this info.
2. It's voluntary and only applies to season ticket holders. You have the option of paying cash to enter the park.
3. Disney, like most smart companies, is trying to stop multiple people from using the same pass to stop fraud.
4. This policy does NOT apply to all visitors

It's really not Mickey Mouse but a good business decision. Unfortunately, it's being mis-interpreted by the press.

Also, reader Perry Metzger writes in to say that he thinks (but isn't sure) that it's not fingerprint scanning but rather hand geometry scanning (at least for the fingers Disney scans). The original news article is bad enough that it's hard to tell definitively either way.

Posted by ekr at 5:19 PM | Comments (5) | TrackBack

July 16, 2005

Welcome to Disney World, please let us scan your fingers

Blake Ramsdell writes in to point out that Disneyland has started fingerprinting visitors:
Tourists visiting Disney theme parks in Central Florida must now provide their index and middle fingers to be scanned before entering the front gates.

The scans were formerly for season pass holders but now everyone must provide their fingers, Local 6 News reported. They have reportedly been phased in for all ticket holders during the past six months, according to a report.

Disney officials said the scans help keep track of who is using legitimate tickets, Local 6 News reported.

This coverage is pretty vague (legitimate tickets? That seems easy to deal with with non-biometric anti-counterfeiting measures) but based on the fact that it used to be just season ticket holders and now it's everyone, it seems pretty clear what's going on. The following table shows Disney World's price structure:

DaysPrice Per Day
Ages 10+Ages 3-9
1 Day Ticket$59.75$48.00
2 Day Ticket$59.50$48.00
3 Day Ticket$57.00$45.67
4 Day Ticket$46.25$37.00
5 Day Ticket$38.60$31.00
6 Day Ticket$32.67$26.17
7 Day Ticket$28.43$22.86
8 Day Ticket$25.25$20.25
9 Day Ticket$22.78$18.22
10 Day Ticket$20.80$16.70

As you can see the price per day drops pretty sharply the more days you buy. This is an obvious price discrimination move: the marginal value to you of staying your 9th day at Disney World is surely less than the value of staying your first day. However, as with all price discrimination, it only works if the seller can stop arbitrage between people they're trying to sell cheaply to and people they want to charge full price. I expect that the concern here is that people will buy 10-day tickets and then share them, thus bringing Disney's price per guest down to the 10 day price.

This is where the fingerprints come in: they stop people from sharing their tickets, thus letting Disney continue to collect their monopoly rents. This is a fairly standard solution, of course; back when I used to go to Great America, they would put your photo on your season ticket pass and the guards would ostensibly check it when you went by. In practice, of course, they didn't check it very carefully and the low-resolution bitmap photos they used make everyone look the same anyway. I imagine that the problem is even worse when most of the customers are kids, who, at least to my eyes, look more similar than adults.

All that said, I wouldn't be that excited about giving Disney my fingerprints, though I can't necessarily put my finger (hah hah) on why. If nothing else, wide use of fingerprints for casual authentication represents a serious threat to their use in high-value authentication contexts. It's well known that given someone's fingerprints its possible to make fake fingers that will fool sensors. Disney says they're not collecting fingerprints but just a representation of the unique features, but it's known to be possible to reverse these representations in some cases, so this isn't as comforting as it should be. Imagine the effects of a compromise of Disney's database should fingerprint authentication ever become widespread.

Certainly, there are ways to make this more private. For instance, you could encode the user's fingerprints on a smart card (digitally signed?) or even a 2-d bar code like with passports and never store them in a back-end database. That way, at least you wouldn't have some database that could be compromised. You'd still need to trust Disney, but not as far.

This leaves us with the question of whether there's a way to salvage Disney's business model without recording people's fingerprints. The other common technique that people use is to have tamper-resistant wristbands. In practice, however, these just aren't that hard to bypass, and when sums of money on the order of $20-50 are at stake you can bet people will, so I think this is a non-starter.

Another option would be to sell a one-day, no in-and-out ticket that doesn't require any kind of authentication. This would let people who wish to maintain their privacy do so, but let you offer a discount to people who don't care. This seems like it would be hard to explain to customers. You could fingerprint people only on the way out, in the same way as clubs hand-stamp you on the way out. That would make the value proposition a little clearer.

The final option, of course, is to stop price discriminating. It's hard to believe that any sane monopolist would do that. Come to think of it, given the ease of forging fingerprints, I guess we should be glad they're not asking for a DNA sample.

Posted by ekr at 7:47 PM | Comments (2) | TrackBack

July 15, 2005

Deploying a New Hash Algorithm

Steve Bellovin and I have a new paper up (submitted to the NIST Hash Function Workshop):
Deploying a New Hash Algorithm
Steve Bellovin and Eric Rescorla

As a result of recent discoveries, the strength of hash functions such as MD5 and SHA-1 have been called into question. Regardless of whether or not it is necessary to move away from those now, it is clear that it will be necessary to do so in the not-too-distant future. This poses a number of challenges, especially for certificate-based protocols. We analyze S/MIME, TLS, and IPsec. All three require protocol or implementation changes. We explain the necessary changes, show how the conversion can be done, and list what measures should be taken immediately.

PDF, PS.

UPDATE: Fixed link to PS file. Thanks to Jens Kubieziel for pointing this out.

Posted by ekr at 10:34 PM | Comments (4) | TrackBack

Some ethical questions

Imagine that you have a magic button that you could push that would let you kill someone with with no consequences to yourself.

  1. Who is the least objectionable person who you would be willing to have killed and why?
  2. Who is the most objectionable person who you would not be willing to have killed and why?
  3. Who is the least objectionable person who you would be happy to hear was dead and why?
  4. Who is the most objectionable person who you would not be happy to hear was dead and why?
  5. If the set of people who would would be happy to hear was dead isn't the same as the set of people who you would be willing to kill, please explain why.

The last question is particularly tricky for consequentialists.

UPDATE: rephrased question 5 to make it clearer.

Posted by ekr at 2:57 PM | Comments (5) | TrackBack

July 14, 2005

Why are cars so cheap?

When Mrs. Guesswork ordered her Toyota Prius, almost a year ago, there was a six month wait list (now down to a few weeks). Despite that, the dealer charged her exactly MSRP and didn't indicate in any way that if we offered him more money we could get the car sooner. This isn't true just for the Prius. Lots of "exclusive" cars like the Audi RS6 and various Porsche and AMG models have long wait lists as well. Why? Here are some potential explanations, with why I don't like them. So, none of these explanations is very satisfactory, but I don't have any better ones to offer.

Posted by ekr at 8:53 PM | Comments (8) | TrackBack

July 11, 2005

Thoughts on the O'Hare experience

I'm currently experiencing the fun of flying through O'Hare and I've got a suggestion for an improvement to the terminal environment:

In the name of all that does not suck, put in some more power outlets.

In my tour of about 5 different gates, I've found exactly two wall plates with 2 plugs each. One plug on the first wall plate was being used to charge those battery-powered carts that airport personnel zip around in and I wasn't brave enough to unplug it. Another desperate traveler was sitting in the cart using the other plug. The other wall plate was about 4 gates down embedded in the base of a metal pillar. Understand that when I say embedded I speak loosely, as more accurate description would be "hanging off of"; it appears that the AC cables were the only thing connecting the faceplate to the pillar. There also appears to be something wrong with the springs that hold your plug into the outlet, since my plug won't stay more than about half in. I'm not a a licensed electrician, but I'm pretty sure that's not code.

Don't get me wrong, there are plenty of those brass floor power boxes with the hinged doors. In any other environment those boxes would actually contain power plugs, but here in Chicago, they've apparently decided that a better use for them would be to store trash.

Outstanding!

Posted by ekr at 9:59 PM | Comments (6) | TrackBack

July 10, 2005

A field guide to teleportation options

Teleportation is a staple of science fiction novels, but not all teleportation is created equal. This post attempts to provide a field guide. The entries below are roughly in order of my estimate of how attractive they are as methods of travel. Note, this unavoidably contains spoilers for a few SF books (though in most of them teleportation is just background). You have been warned.

Wormholes, tesseracts, etc.
This is the standard science-fiction teleport. We somehow connect space-time at point A to space-time at point B and then just walk through. This is pretty much the optimal form of travel: fast travel with no sticky philosophical problems about identity (though if you get there faster than the speed of light, you have to worry about causality violation and all its philosophical problems).

Examples: Hyperion, Pandora's Star, A Wrinkle In Time.

Quantum teleportation
When you read about teleportation in the popular press, the articles are typically about some sort of quantum entanglement. You arrange for system A and system B to be in an entangled quantum state, allowing you to move system B into the same state as system A (while destroying the state at A). This has actually been demonstrated in the lab with a pair of particles, which is only about 25 or so orders of magnitude removed from the size of a human being.

The obvious downside of quantum teleportation is that the local copy of your body is destroyed. Now, from a quantum physics perspective, this isn't really a problem: once teleportation is complete, the new system is in exactly the same quantum state as the original system, so they're interchangeable. However, many people don't find this satisfactory, since the original you is, after all, gone.

Examples: Spin State.


Scan and transmit
A more "conventional" approach is to use some sort of high resolution scanning to capture an exact image of your body. You then transmit it from point A to point B where a copy is assembled. The philosophical problems here are much more severe than with quantum teleportation, because there are two copies. You can certainly treat the system as a duplicator, in which case you're not really travelling, and worse yet, scattering copies all over the world. Alternately, you can destroy the first copy, but this starts to feel much more like a death machine than a teleporter.

What makes the destroy-the-original variant particularly problematic is the possibility that there are temporarily two copies of you and then you destroy one of them. They have the opportunity to diverge in the time between the duplication and the destruction, at which point you're killing a distinct person. In this respect, any teleportation process which inherently destroys the original is superior since there is no possibility for divergence. How you feel about this goes to the heart of your opinions about identity. This obviously gives people pause. Parfit's Reasons and Persons has a particularly good analysis of this case.

A related variant is an inherently destructive scan (with the idea that a really deep san is destructive) + transmit. This has similar philosophical issues to quantum teleportation. The difference is primarily one of mechanism, though you would expect that a quantum copy would be more accurate.

Examples: Think Like A Dinosaur, Just Peace in Vinge's Threats and Other Promises.

Mental transmission
Finally, consider the possibility of transporting just your personality. At the other end, it gets implanted in some host body. Once again, we have both the destructive scanning option and the duplication option. To make matters worse, we have the uploading into a computer option. As before, things are a lot easier if the scanning process is inherently destructive, since we don't have any chance of there being multiple copies.

Examples: Altered Carbon, Mindswap, Warpath. Also, see Greg Egan's Axiomatic and Diaspora for treatments of uploading, copying, and mental identity.

Posted by ekr at 10:16 PM | Comments (2) | TrackBack

July 9, 2005

When associating is outlawed...

A Florida man has been arrested and charged with a felony for using an open wireless AP [*]:
NEW YORK (CNN/Money) - Police have charged a Florida man with a third-degree felony charge, after he was arrested for accessing a St. Petersburg resident's wireless Internet network without permission.

According to the police, Benjamin Smith III was seen by Richard Dinon outside Dinon's home on the night of April 20, 2005, sitting in a parked SUV and using a laptop computer. When Dinon went outside to deposit his trash, Smith quickly closed the laptop and tried to hide it.

Dinon also stated that he later observed foreign icons on his home computer screen, and suspected that Smith, 41, may have been using his network. He called police and an officer confronted Smith at 11:30 p.m., two hours after the initial sighting.

"The arresting officer wasn't initially sure a violation took place," said George Kajtsa of the St. Petersburg Police Department. "He consulted our legal staff and they looked up the relevant statute."

The charge, unauthorized access to a computer network, applies to all varieties of computer network breaches, and gives prosecutors considerable leeway depending on the severity. It carries a potential sentence ranging from probation to 5 years in prison.

I don't understand the law of what constitutes unauthorized access very well, but I do know that it's extremely common for people to leave their APs open--either intentionally or out of ignorance--and that many people will simply use whatever open AP they happen to find. In addition, many operating systems will automatically associate with open APs, so it's not even clear that this is a voluntary act on the part of the user. MacOS X, for instance pops up a dialog box, but it's hard to imagine that most users understand what's going on. It would certainly be very easy to think you were connecting to an authorized AP when in fact you were connecting to someone else's open AP--it doesn't help that it's very common for people to leave the default SSID (e.g., "linksys"). All that, said, seeing as Mr. Smith was parked outside Mr. Dinon's home I think it's clear he knew what he was doing.

In any case, as a home user you need to be aware of the threat environment. Illegal or not, if you have an open AP you should expect that people will use your net connection. If you don't want that, close down your network. Even WEP--though insecure--is probably good enough for most purposes. Few ordinary users are going to bother to try to crack a WEP-secured network.

Posted by ekr at 11:47 PM | Comments (2) | TrackBack

Why people don't like national ID cards

If you wonder why people are opposed to national ID cards, you might want to read Perry Metzger's message from the cryptography mailing list here:
Perhaps I can explain why I am.

I do not trust governments. I've inherited this perspective. My grandfather sent his children abroad from Speyer in Germany just after the ascension of Adolf Hitler in the early 1930s -- his neighbors thought he was crazy, but few of them survived the coming events. My father was sent to Alsace, but he stayed too long in France and ended up being stuck there after the occupation. If it were not for forged papers, he would have died. (He had a most amusing story of working as an electrician rewiring a hotel used as office space by the Gestapo in Strasbourg -- his forged papers were apparently good enough that no one noticed.) Ultimately, he and other members of the family escaped France by "illegally" crossing the border into Switzerland. (I put "illegally" in quotes because I don't believe one has any moral obligation to obey a "law" like that, especially since it would leave you dead if you obeyed.)

Anyway, if the governments of the time had actually had access to modern anti-forgery techniques, I might never have been born.

To you, ID cards are a nice way to keep things orderly. To me, they are a potential death sentence.

There's almost no need in your ordinary life to be positively physically identified. Yes, I know you end up showing your ID several times a week to register for video accounts or cash checks, but that's because all of our mechanisms for ensuring trustworthiness are tied up in physical identity. The people looking at your ID mostly don't care who you are. They just want to know that they're going to get paid. It's easy to construct systems with that property that aren't actually ID cards. In fact, you've probably noticed that you can regularly use your credit card without showing ID--that's because the merchant doesn't absorb the cost of credit card fraud, and the credit card company wants your credit card to be easy to use.

Posted by ekr at 6:58 AM | Comments (10) | TrackBack

July 8, 2005

Cooling for performance

Kevin Dick pointed me to this interesting article about the effect of cooling on human performance. It turns out that if you cool down your body between sets you can get a very substantial performance improvement:
Like any athlete, Weir is well acquainted with his normal performance range. Like any athlete, Weir looks for an edge. A few years ago, he was intrigued when he heard about a device that has been called at various times the RTX, Core Control or simply The Glove invented by a pair of Stanford biologists. Using the device to lower his core body temperature between sets, he was able to lift 495 pounds in four sets of squats instead of his normal two. He usually does squats only on Mondays, but he decided to try a second series a few days later. That Friday, he was able to increase the weight to 545 pounds. I was surprised the sets felt so good, he says, but adds that the real test came the following Monday. Weir, 44, expected to see significant performance degradation due to the extra Friday workout. Not only did he not see the decay, he increased weight with every set. The RTX for rapid thermal exchange cooling device is a very serious piece of equipment, he says. At my age, you dont expect to be setting personal bests during workouts. He trained with the cooling equipment for the 2002 Commonwealth Games, and placed third in the discus. His oldest competitor was 15 years younger.

The RTX is a gizmo that applies cold and vacuum to your hand, thus lowering your core temperature. It's well known that people's performance drops off pretty badly in heat and I, like a lot of people, perform better in cold, but it's rather surprising that you can get this large a performance improvement by cooling people down through their hand, especially when you're only cooling between sets.

Posted by ekr at 10:00 PM | Comments (3) | TrackBack

July 7, 2005

Lessons from today's bombing

Posted by ekr at 10:06 PM | TrackBack

DNS poisoning to block P2P?

Constitutional Code reports that the German music rights organization wants ISPs to set their DNS servers to block resolution of eDonkey link servers:
The German rights organisation for composers, lyricist and publishers, GEMA, has asked 42 access providers to poison their DNS servers in order to block sites that provide links to eDonkey files. In short, DNS poisoning obstructs the process of converting a URL to a numeric IP address. The GEMA apparently expects the access providers to configure their DNS servers so that "inquiries by end-users are not passed to the correct server, but to an invalid or another pre-defined side." The GEMA also demands that the providers sign a testimony,with which they commit themselves to ensure full blockage under a contractual penalty of 100.000 euro if any of their customers can still reach the targeted site after July 25th.

This is a truly bad idea. As CoCo points out it's trivial to bypass your local ISP's DNS server and get name resolution from somewhere else. Clients can be easily programmed to do this, and if the ISPs actually accede to this demand, you can be sure that the client authors will waste no time in doing so. All that will be left is the collateral damage to all the sites which run other, legitimate services.

Posted by ekr at 9:41 PM | TrackBack

July 6, 2005

Anonymity and the Pentagon Papers

Journalists defending the right to keep sources anonymous love the cite the Pentagon Papers. Here's Shapiro again:
There have been acrobatic efforts to distinguish between good leaks (say, the Pentagon Papers) and bad leaks (Plame's CIA position). But who is going to make these hair-splitting distinctions? And on what grounds? Those who scream "national security" or even (hysterically) "treason" over the flaming of Plame should recall that these very same arguments were brandished by the Nixon administration against the publication of the Pentagon Papers.

It's absolutely true that the Nixon administration wanted to suppress the Pentagon papers, but remember that although the papers were anonymously sourced, Ellsberg's identity was known extremely early. The Pentagon Papers were published June 13, 1971 and Ellsberg surrendered to the FBI June 28. [*]

For what it's worth, here's Ellsberg himself:

Yet ALL of these methods are worth considering, even the most professiionally risky, when a war's worth of lives are at stake. They range from anonymously or (more effectively) personally providing DOCUMENTS to the "oversight" committeess--least risky, but possible least effective--to providing them to members of other relevant committees, such as Foreign Relations or Armed Services, or to known-sympathetic members of Congress, to, at the other extreme, providing documents in large quantity (after, of course, using your own best professional judgment to exclude any that would, in your cautious opinion, actually endanger individuals or harm national security) to the newspapers, in such quantity that your identity is likely to become known. A press conference is even a possibility; but it is not at all essential to the effectiveness of what you're doing that your own name be made public, as a red flag to the Administration. I myself (see my book) would have greatly preferred to be an anonymous source, rather than to challenge the Administration to prosecute me; I only revealed myself after it was clear they were going to prosecute anyway, and then I revealed myself as the source in order to deflect suspicion, as best I could, from others.

But in any case, the important thing, practically speaking, is to PROVIDE DOCUMENTS. When it comes to contradicting the president, and alerting the public to a situation which, in your best judgment, the national security and many human lives are endangered by the Administration's improper secrecy, concealment and deception, there is no substitute for documents.

Ellsberg was actually charged but he got off after it came out that the Nixon administration had burgled his psychiatrists office in an attempt to dig up dirt on him.

Posted by ekr at 9:51 PM | TrackBack

Do we need journalists to enable leaks?

Walter Shapiro makes the standard argument for why journalists shouldn't have to disclose their sources:
The Bush White House has been the most locked-down in history for reporters. And future administrations, even Democratic ones, are likely to emulate this nearly impenetrable Karen Hughes-inspired, message-discipline approach, under which even innocuous unauthorized conversations with the press can be potential firing offenses. As a result, the only way that even a glimmer of truth can emerge from places like the White House, the Pentagon, and the CIA will be if government officials trust reporters to keep their identities secret. That means that reporters must stand their ground amid the predictable frenzy of leak investigations. It is not an appealing bargain if a reporter promises to protect a source as long as it is convenient.

Reporters serve two important functions in the propagation of leaks (or news in general):

  1. They disseminate the information.
  2. They vouch for its accuracy.

The Internet has rendered the first function more or less obsolete. Sure, back in the old days you needed someone with a printing press to publicize the story, but today anyone with an Internet connection can put up a web site and send an anonymous pointer to a random reporter. If that fails, they can e-mail Atrios, Powerline, or Drudge. If the story is interesting and even vaguely plausible it will be picked up by someone, regardless of the provenance.

That leaves us with vouching for the accuracy of the information. This is clearly an important function since you're hardly in a position to verify something like the Pentagon papers yourself. However, we need to ask how important knowing the source's identity is to allowing the reporter to do this. There are three major cases here:

  1. The information can be independently authenticated. I.e. it's on official paper, contains secret information the reporter otherwise knows, is digitally signed (hah!), etc. In this case, it's not really important to know who the leaker is, since the reporter can do their own checking.
  2. The information isn't directly reportable but is a useful lead. It's not really that important for the reporter to know the source here either. True, they potentially risk wasting a bunch of time tracking down the lead to find it's bogus, but that's not exactly a crippling blow to journalism--it just makes it a slightly more expensive profession to pursue.
  3. The information can't be independently authenticated. In this case you're basically taking the reporter's word that they've checked out the source and it's legit.

The third case is pretty much the only one in which it's critical for the reporter to know the source's identity, because they're personally vouching for the story on the basis of the source's position. Obviously, if journalists don't manage to protect their sources, then this would tend to deter this kind of non-anonymous leaking. Of course, this doesn't mean a fatal blow for journalism, but we'd need alternative ways to vet sources and information.

The obvious procedure is to build a long term relationship with an anonymous source (the literature on pseudonymous communication is extensive). The source starts by providing information which is semi-secret and independently verifiable. After they've built a reputation for providing useful information to the journalist the journalist can start to trust unauthenticatable information. As I understand it, this is much the same procedure that one uses to vet information from foreigners who offer to provide intelligence--a case where you can't trust them based on their position because they could be working for a foreign inteligence agency to entrap you. Obviously this wouldn't work as well and would make it difficult for one-time unauthenticated leaks. But it's not clear how often that really happens. It's certainly not clear that it happens enough that removing the possibility would destroy journalism.

It's also worth noting that the value of category (3) leaks derives almost entirely from our ability to trust that reporters can adequately vet the accuracy of information--since they are concealing the information we would need to vet it ourselves. In the wake of the CBS debacle, the public's willingness to trust journalists in that way--justified or not--has probably decreased quite a bit. It's particularly ironic that Judith Miller is the journalist in question here, since it's precisely her inability to accurately assess her sources credibility that led her to get her reporting on WMDs so badly wrong.

Posted by ekr at 8:57 PM | Comments (3) | TrackBack

July 5, 2005

Bootstrapping authentication

A post on Interesting People today (originally on Cypherpunks) about the difficulty of getting government issued ID in the post RealID era:
For those of you who may have missed it, today was the first day of the
new "Real ID Act", a/k/a, the American Nazification Papers Act.  I
wouldn't have know myself except that I recently moved, and wanted to
exchange my current Illinois drivers license for a Missouri one.

Not so fast...

"You have a passport?"

"No, I don't travel."

"A certified copy of your original birth certificate?"
"Haven't had one since I was born, fifty years ago. And since I was born
about 1500 miles from here, getting one is no small task."

"Too bad. Your old license is invalid and you can't get another one in
any state, starting today, without at least one of the two documents, PLUS
secondary ID to back them up."

Even though I have a current license, and even though I am in their system
as having held a valid Missouri license for 15+ years, photo included,
none of it is good enough.

OK, so I have no choice, I go to the post office to get a Passport - same
thing.

Fine, I'll just order the birth certificate and get it over with, right?

Wrong. New York wants affirmative proof of identity for a copy now:
passport or your [missing] original birth certificate. Anyone else see a
circular problem here?

"I need a new birth certificate because the old one was lost about forty
years ago. And I don't have a passport to prove my identity."

"Get your parents to testify who you are, and make sure they bring their
passports."

"They are both dead."

"Sorry Sir, I'm afraid we won't be able to help you then."
<click>

Ignore the Godwin's Law violation and just focus on the story. In order to get a strong government issued ID you have to prove your identity, which is done, naturally, using a strong ID, which was the problem in the first place. It sounds like something out of Gilbert and Sullivan but it's a real problem for any system that wants to strongly authenticate people.

The real problem is that the whole notion of personal identity is a fairly fuzzy one. In some sense, the notion of discrete personal identity is created by possessing strong personal identification. Historically we've had a variety of mechanisms for mapping that fuzzy personal identity onto strong forms of identity, but they never achieved the goal--nor were really intended to--of making it hard for you to get a new identity that didn't belong to you, because it's not clear what that means anyway. If you're going to be in the business of issuing strong personal identities, you need to either preserve those weak mechanisms or stop pretending that the strong personal identities are tied to the more amorphous information that preceded the issuance of the strong identity.

Posted by ekr at 1:11 PM | Comments (2) | TrackBack

July 4, 2005

Is it good to have taste?

Eu-Jin Goh and I spent some time today discussing the question of whether it's good to have taste. Lots of products span a wide range of quality and price, but with differences that are purely aesthetic and take practice to appreciate. The classic example here is wine, which spans at least 5 orders of magnitude in terms of price. Now, all (or at least most) wine contains alcohol and to an unpracticed taste, the differences between a $5 bottle of wine and a $500 bottle of wine aren't that great. So, is it good for you to have taste?

The obvious advantage of having taste is that you get to enjoy really nice stuff. For a long time I didn't really like sushi, so it all tasted much the same. On the other hand, now that I've acquired a taste for sushi, I really appreciate good sushi, and I enjoy it much more than I ever did before. So, that's on the plus side. There are downsides, however. The obvious one is that I'm much more acutely aware of how bad bad sushi is. Whereas before I could just eat it, now it's basically intolerable. This means that if I want to eat sushi, I have to eat good sushi, which is much more expensive.

I don't think there's a definitive answer here: it depends on how easily you can afford the good stuff. If you can, then you're probably getting more pleasure from having taste than it's costing you. On the other hand, if you can't afford good stuff, you're just torturing yourself by developing good taste.

Posted by ekr at 9:46 PM | Comments (9) | TrackBack

July 2, 2005

125 open science problems

Science magazine is running a 125th anniversary specialwith 25 major open problems and 100 not-so-major ones (no subscription required). It's kind of a mixed bag:

None of the articles go into much depth, but they do provide an overview of some of today's interesting scientific problems--certainly enough to give you a sense of what might be interesting for further reading.

Posted by ekr at 7:55 AM | Comments (2) | TrackBack

July 1, 2005

Canada to block US drug exports

Canada is planning to tighten their rules to make it more difficult for Americans to to import drugs from across the border. The incentive seems to be that US drug companies are starting to restrict their shipments to Canada in an effort to disincentive export. Obviously, this creates shortages.

Anyone who understood economics always understood that this was the likely outcome of drug reimportation. Th