January 23, 2012

You have to have used git to really understand this one, but...
[16] git checkout f4a56
Note: checking out 'f4a56'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at f4a560b... Foo
As you may have gathered from this long warning, you most likely don't want to be in a detached head setting, you probably just meant to create a branch or wanted to rollback a commit but typed the wrong thing. Which is why there are lots of pages about what this means and how to get yourself out. My contribution to this literature can be found below the fold.

 

January 22, 2012

On my way to Red Rock today to do some work, I looked in my wallet to see if I had enough money to afford my hot chocolate (paying for a $3.50 drink with a credit card is a pretty lame move). Here's what I found:

After some sorting, it comes out as follows...

Currency Count Value (nominal) Value (USD)
USD 3 3 3
CAD 7 100 98.55
CZK 2 2100 106.40
GBP 1 10 15.55
EUR 1 20 25.79
INR 1 100 1.99
RUB 9 1570 49.97
Total 24 - 301.25

In other words, out of 24 total pieces of paper valued at over $300, I had three spendable pieces of paper valued at $3. Oh, and a couple of United beverage vouchers which expire in 9 days. I ended up going to the ATM.

 

January 21, 2012

You've of course heard by now that much of the Internet community thinks that SOPA and PIPA are bad, which is why on January 16, Wikipedia shut itself down, Google had a black bar over their logo, etc. This opinion is shared by much of the Internet technical community, and in particular much has been made of the argument made by Crocker et al. that DNSSEC and PIPA are incompatible. A number of the authors of the statement linked above are friends of mine, and I agree with much of what they write in it, but I don't find this particular line of argument that convincing.

Background
As background, DNS has two kinds of resolvers:

  • Authoritative resolvers which host the records for a given domain.
  • Recursive resolvers which are used by end-users for name mapping. Typically they also serve as a cache.

A typical configuration is for end-user machines to use DHCP to get their network configuration data, including IP address and the DNS recursive resolvers to use. Whenever your machine joins a new network, it gets whatever resolver that network is configured for, which is frequently whatever resolver is provided by your ISP. One of the requirements of some iterations of PIPA and SOPA has been that recursive resolvers would have to block resolution of domains designated as bad. Here's the relevant text from PIPA:

(i) IN GENERAL- An operator of a nonauthoritative domain name system server shall take the least burdensome technically feasible and reasonable measures designed to prevent the domain name described in the order from resolving to that domain name's Internet protocol address, except that--
(I) such operator shall not be required--
(aa) other than as directed under this subparagraph, to modify its network, software, systems, or facilities;
(bb) to take any measures with respect to domain name lookups not performed by its own domain name server or domain name system servers located outside the United States; or
(cc) to continue to prevent access to a domain name to which access has been effectively disable by other means; and ...
(ii) TEXT OF NOTICE.-The Attorney General shall prescribe the text of the notice displayed to users or customers of an operator taking an action pursuant to this subparagraph. Such text shall specify that the action is being taken pursuant to a court order obtained by the Attorney General.

This text has been widely interpreted as requiring operators of recursive resolvers to do one of two things:

  • Simply cause the name resolution operation to fail.
  • Redirect the name resolution to the notice specified in (ii).

The question then becomes how one might implement these.

Technical Implementation Mechanisms
Obviously if you can redirect the name, you can cause the resolution to fail by returning a bogus address, so let's look at the redirection case first. Crocker et al. argue that DNSSEC is designed to secure DNS data end-to-end to the user's computer. Thus, any element in the middle which modifies the DNS records to redirect traffic to a specific location will break the signature. Technically, this is absolutely correct. However, it is mitigated by two considerations.

First, the vast majority of client software doesn't do DNSSEC resolution. Instead, if you're resolving some DNSSEC-signed name and the signature is being validated at all it's most likely being validated by some DNSSEC-aware recursive resolver, like the ones Comcast has recently deployed. Such a resolver can easily modify whatever results it is returning and that change will be undetectable to the vast majority of client software (i.e., to any non-DNSSEC software).1. So, at present, a rewriting requirement looks pretty plausible.

Crocker et al. would no doubt tell you that this is a transitional stage and that eventually we'll have end-to-end DNSSEC, so it's a mistake to legislate new requirements that are incompatible with that. If a lot of endpoints start doing DNSSEC validation, then ISPs can't rewrite undetectably. They can still make names fail to resolve, though, via a variety of mechanisms. About this, Crocker et al. write:

Even DNS filtering that did not contemplate redirection would pose security challenges. The only possible DNSSEC-compliant response to a query for a domain that has been ordered to be filtered is for the lookup to fail. It cannot provide a false response pointing to another resource or indicate that the domain does not exist. From an operational standpoint, a resolution failure from a nameserver subject to a court order and from a hacked nameserver would be indistinguishable. Users running secure applications have a need to distinguish between policy-based failures and failures caused, for example, by the presence of an attack or a hostile network, or else downgrade attacks would likely be prolific.[12]

..

12. If two or more levels of security exist in a system, an attacker will have the ability to force a "downgrade" move from a more secure system function or capability to a less secure function by making it appear as though some party in the transaction doesn't support the higher level of security. Forcing failure of DNSSEC requests is one way to effect this exploit, if the attacked system will then accept forged insecure DNS responses. To prevent downgrade attempts, systems must be able to distinguish between legitimate failure and malicious failure.

I sort of agree with the first part of this, but I don't really agree with the footnote. Much of the problem is that it's generally easy for network-based attackers to generate situations that simulate legitimate errors and/or misconfiguration. Cryptographic authentication actually makes this worse, since there are so many ways to screw up cryptographic protocols. Consider the case where the attacker overwrites the response with a random signature. Naturally the signature is unverifiable, in which case the resolver's only response is to reject the records, as prescribed by the DNSSEC standards. At this point you have effectively blocked resolution of the name. It's true that the resolver knows that something is wrong (though it can't distinguish between attack and misconfiguration), but so what? DNSSEC isn't designed to allow name resolution in the face of DoS attack by in-band active attackers. Recursive resolvers aren't precisely in-band, of course, but the ISP as a whole is in-band, which is one reason people have talked about ISP-level DNS filtering for all traffic, not just filtering at recursive resolvers.

Note that I'm not trying to say here that I think that SOPA and PIPA are good ideas, or that there aren't plenty of techniques for people to use to evade them. I just don't think that it's really the case that you can't simultaneously have DNSSEC and network-based DNS filtering.

 

1. Technical note: As I understand it, if you're a client resolver who wants to validate signatures itself needs to send the DO flag (to get the recursive resolver to return the DNSSEC records) and the CD flag (to suppress validation by the recursive resolver). This means that the recursive resolver can tell when its safe to rewrite the response without being detected. If DO isn't set, then the client won't be checking signatures. If CD isn't set, then the recursive resolver can claim that the name was unvalidatable and generate whatever error it would have generated in that case (Comcast's deployment seems to generate SERVFAIL for at least some types of misconfiguration.)

 

January 11, 2012

In Dahlia Lithwick's report on FCC v. Fox (about the FCC's TV indecency policy), she writes:
Justice Stephen Breyer raises a question about why the ABC ass case is being heard together with the fleeting-expletives case. Justice Ginsburg asks whether Hair could be broadcast on network television (Verrilli: "Serious questions") and then whether the opera Metropolis could be broadcast (Verrilli: "Context-based approach"). Then Justice Anthony Kennedy interrupts the parade of naked horrible to clarify: "What you're saying is that there is a public value in having a particular segment of the media with different standards than other segments." Verrilli replies that, yes, this is about preserving "a safe haven where if parents want to put their kids down in front of the television at 8:00 p.m. they're not going to have to worry about whether the kids are going to get bombarded with curse words or nudity."

Because if you want that, you can find it in the back seat of my car, at rush hour when we're late for Kung Fu. Just ask my children.

Kennedy replies that the V-chip is available and that "you ask your 15-year-old, or your 10-year-old, how to turn off the chip. They're the only ones that know how to do it."

I'm not saying this isn't true--though I rather suspect it's more likely that parents don't know how to turn on the V-chip [explanation here, in case you don't know] than that they don't know how to turn it off. However, I think discussion illustrates pretty clearly the confusion over the problem that people are trying to solve. (The terminology "threat model" as applied to children probably sounds funny to non-parents.) In any case, there are two different things one might be trying to accomplish with respect to potentially objectionable content:

  • Prevent children from inadvertantly accessing objectionable content.
  • Prevent children from intentionally accessing objectionable content.

If your objective is the former, then the V-chip works fine (except for the horrible UI); you just configure your device to suppress objectionable content. The sort of content-based regulation the FCC is engaging in works as well, assuming you don't let your kids watch TV except in the "safe" period, but it's a very inefficient mechanism compared to the V-chip, being both overbroad (affecting everyone, including people who don't have children) and not very effective, as it applies only to broadcast TV.

On the other hand, if your model is to prevent children from intentionally accessing objectionable content, and you further expect them to attempt to bypass content controls, then restrictions on broadcast TV don't do very much given that (a) if you have cable your kids can just tune to unrestricted channels and (b) large number of other sources of such content exist on the Internet. Blocking the tiny sliver of such content you still get through broadcast TV mostly looks silly and anachronistic. Though I guess it's less silly if you get hit with a huge fine for breaching the rather unclear rules.

 

December 31, 2011

Spent some of today getting my 2011 charitable donations out of the way, so I've been experiencing a lot of different Web forms. Remember, these people want my money, so it would be nice if they didn't make the experience so irritating. On that basis, here are some things not to do:
  • Refuse to accept spaces or dashes in my credit card number, phone number, social security number, etc. Don't force me into your stupid format; parse whatever I send you. Here, let me help. The following JS code strips out spaces and dashes. input = input.replace(/[ \-]/g, "");. For an appropriately huge consulting fee I'll show you how to replace periods and pluses, too.
  • Force me to tell you what kind of credit card I have. This information is encoded in the leading digits of the credit card number. This table may help. I know that things change, but seriously, you could at least try to guess.
  • Force me to select "USA" out of the end of an incredibly long drop-down list of countries. It's true that you can generally determine someone's country by looking at their IP address, but I can certainly understand not wanting to bother with that, but if most of your customers are American, it's silly to force them to scroll all the way to the end out of a misguided notion of national equity. Make my life easy and put the USA as the first item in the list, people.
  • Make me enter my state and my zip code. In nearly all cases, the zip code encodes the state.

Also, not a Web form issue, but I also wish there were some way to tell these organizations not to ask me for donations during the year. I give once a year, at the end of the year. It's just a matter of convenience. Sending me a bunch of physical letters asking for money just wastes your fund raising dollars and my time.

 

December 22, 2011

Mark Garrison has a rather odd article in Slate arguing that we need expert advice to order beer in restaurants:
It's a busy night at the D.C. restaurant Birch & Barley, as well as its casual upstairs sister joint, ChurchKey. Greg Engert is guiding me through his beverage list with all the knowledge, talent, and grace one would expect from an award-winning sommelier. With a couple crisp queries, he learned enough to make some intriguing recommendations. He didn't flaunt his knowledge about food and drink, but when I had questions, he gave precise answers about the flavor, aroma, producer, pairing potential, and even the history of the available beverages. Fortunately, there was no attempt at upselling, the odious sin far too many sommeliers commit, a big reason why many diners are suspicious of the entire profession.

...

There may be agreement in the industry that great beer deserves top-notch service, but there's not yet a consensus on what that means. In fact, there's not even agreement on what to call a well-trained beer server. Engert's job title is beer director, but he doesn't mind being called a beer sommelier. (He has put some thought into this.) Some in the beer community find this term problematic, since "sommelier" is tied to the wine world and may imply a professional certification that doesn't exist.

...

The program's website states the claim that wine sommeliers might have known enough to choose a good beer for you a few decades ago, but now "the world of beer is just as diverse and complicated as wine. As a result, developing true expertise in beer takes years of focused study and requires constant attention to stay on top of new brands and special beers." So Daniels set out to build a testing and certification program to create a standard level of knowledge and titles that would signify superior beer knowledge to consumers, similar to the way a Court of Master Sommeliers credential does for wine.

Look, I love beer, don't like wine, and am well aware of the lousy beer service one typically gets at restaurants, so I'm generally in favor of anything that improves beer quality. But the main the problem isn't that there's nobody at the restaurant who understands beer. It's that the beer selection at restaurants sucks. To take one recent example, I ate at the Los Altos Grill the other night: they had a page of wines and three beers on tap. This isn't uncommon; in fact it's not uncommon for restaurants to have solid wine lists but only bottled beer, and only a few varieties of bottles at that. The question I have for waiters isn't "what beer do you recommend", but rather "is Peroni really the best beer you have?"

In large part, the culprit here is customer demand: people who eat at high-end restaurants tend to prefer wine to beer, so those restaurants naturally have lousy beer selections. But I suspect that the chemistry of beer has a lot to do with it as well. Wine can last years in the bottle—and many wines are better when aged—but bottled beer has a shelf life measured in months, with draft beer going bad in in a few weeks. So, unlike wine, you can't afford to stock any beer that people don't order fairly frequently, since there's too high a chance it will go bad before someone orders it. I suspect that this is why most restaurants keep such a small beer selection. (Anyone with contacts in the restaurant business should feel free to chime in here.)

The major exception here is restaurants that specialize in beer (Garrison's example of Birch & Barley advertises itself as "a completely unique food and beer experience celebrating a full spectrum of styles, traditions, regions and flavors"). If you're that kind of restaurant you probably get enough volume to keep a large inventory without things getting too stale—though I do wonder what the oldest bottle on their shelves tastes like.

 

December 18, 2011

The first step in most Internet communications is name resolution: mapping a text-based hostname (e.g., www.educatedguesswork.org) to a numeric IP address (e.g,, 69.163.249.211). This mapping is generally done via the Domain Name System (DNS), a global distributed database. The thing you need to know about the security of the DNS is that it doesn't have much: records are transmitted without any cryptographic protection, either for confidentiality or integrity. The official IETF security mechanism, DNSSEC is based on digital signatures and so offers integrity, but not confidentiality, and in an any case has seen extremely limited deployment. Recently, OpenDNS rolled out DNSCrypt, which provides both encrypted and authenticated communications between your machine and a DNSCrypt-enabled resolver such as the one operated by OpenDNS. OpenDNS is based on DJB's DNSCurve and I've talked about comparisons between DNSSEC and DNSCurve before, but what's interesting here is that OpenDNS is really pushing the confidentiality angle:

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks. It doesn't require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers. We know that claims alone don't work in the security world, however, so we've opened up the source to our DNSCrypt code base and it's available on GitHub.

DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user's online security and privacy.

Unfortunately, I don't think this argument really holds up under examination. Remember that DNS is mostly used to map names to IP addresses. Once you have the IP address, you need to actually do something with it, and generally that something is to connect to the IP address in question, which tends to leak a lot of the information you encrypted.

Consider the (target) case where we have DNSCrypt between your local stub resolver and some recursive resolver somewhere on the Internet. The class of attackers this protects against is those which have access to traffic on the wire between you and the resolver. Now, if I type http://www.educatedguesswork.org/ into my browser, what happens is that the browser tries to resolve www.educatedguesswork.org, and what the attacker principally learns is (1) the hostname I am querying for and (2) the IP address(es) that were returned. The next thing that happens, however, is that my browser forms a TCP connection to the target host and sends something like this:

GET / HTTP/1.1
Host: www.educatedguesswork.org
Connection: keep-alive
Cache-Control: max-age=0
...

Obviously, each IP packet contains the IP address of the target the Host header contains the target host name, so any attacker on the wire learns both. And as this information is generally sent over the same access network as the DNS request, the attacker learns all the information they would have had if they had been able to observe my DNS query. [Technical note: when Tor is configured properly, DNS requests are routed over Tor, rather than over the local network. If that's not true, you have some rather more serious problems to worry about than DNS confidentiality.]

"You idiot," I can hear you saying, "if you wanted confidentiality you should have used SSL/TLS." That's true, of course, but SSL/TLS barely improves the situation. Modern browsers provide the target host name of the server in question in the clear in the TLS handshake using the Server Name Indication (SNI) extension. (You can see if your browser does it here), so the attacker learns exactly the same information whether you are using SSL/TLS or not. Even if your browser doesn't provide SNI, the hostname of the server is generally in the server's certificate. Pretty much the only time that a useful (to the attacker) hostname isn't in the certificate is when there are a lot of hosts hidden behind the same wildcard certificate, such as when your domain is hosted using Heroku's "piggyback SSL". But this kind of certificate sharing only works well if your domain is subordinated behind some master domain (e.g, example-domain.heroku.com), which isn't really what you want if you're going to offer a serious service.

This isn't to say that one couldn't design a version of SSL/TLS that didn't leak the target host information quite so aggressively—though it's somewhat harder than it looks—but even if you were to do so, it turns out to be possible to learn a lot about which sites you are visiting via traffic analysis (see, for instance here and here). You could counter this kind of attack as well, of course, but that requires yet more changes to SSL/TLS. This isn't surprising: concealing the target site simply wasn't a design goal for SSL/TLS; everyone just assumed that it would be clear what site you were visiting from the IP address alone (remember that when SSL/TLS was designed, it didn't even support name-based virtual hosting via SNI). I haven't seen much interest in changing this, but unless and until we do, it's hard to see how providing confidentiality for DNS traffic adds much in the way of security.

 

December 8, 2011

I've been meaning to write something about espresso and the various technology options for making one, but I never get around to it. Now I have. I'm not an espresso-making expert, but I'm a guy who cares about espresso, has a moderate but not extreme budget, and can pull a fairly solid shot. As such, this might or might not be useful to you. There are many articles like this, but this one is mine.

The discussion below is restricted to what's called "semi-automatic" machines: those where you grind the coffee yourself but the machine has controls designed to regulate temperature and pressure. "Super-automatic" where you put in beans and water and they put out coffee are out of scope here.

Consistency
The basic principle of espresso is simple: you grind up the coffee, pack it down and then force heated water through under pressure. The difference between swill and pure liquid perfection is in the details. Moreover, if you're going to get the details right, the first thing you need to do is get them consistent; the exact procedures and settings you need differ with each coffee and each machine, but if you can be consistent then you can dial them in over time. [Aside: when I took machining in college, the first thing the instructor told me was that machining wasn't about cutting metal, it was about measurement. If you could measure accurately, you could cut accurately.] The major variables you need to control are:

  1. The coffee itself.
  2. The grind.
  3. The amount of coffee.
  4. The dispersal into the portafilter basket and the tamp.
  5. Water temperature.
  6. Water pressure.

The coffee is something you buy, so you have some control over it but not complete control. With the right grinder, you can completely control the grind and the amount of coffee. Dispersal and tamp is a matter of personal technique and practice. With the right espresso machine, you can control water temperature quite precisely and with any pump machine, pressure control should be quite good. So, as you can tell, this is primarily a matter of getting good equipment.

Grinder
The grinder thing is pretty simple: get a burr grinder with enough adjustments. Don't get a doser. Get one with a timer. A little elaboration: blade grinders (the cheap canister ones that you can buy for $20-$40) don't do a good job of getting you a consistent grind. The individual grounds aren't the same size and you can't control the overall size except by grinding longer. Don't buy one. You want a burr grinder and you want one that allows you to adjust the grind finely and over a large range. Different beans require different grinder settings, so easy adjustment matters if you change beans much.

The reason you want a timer is to let you control the amount of coffee you grind. This is a parameter people usually specify by mass, but using a scale is a pain in the ass. Grind time is a good proxy here. What I typically do is make some test shots and then set the grind time on my grinder (it has 3 presets). Then when I want to pull a shot I just put the portafilter under the grinder and hit the right preset button. None of this requires much thought once you get it wired.

There are lots of good grinders. What I have is a Baratza Vario. There are two features I like about this. First, it has easy adjustments with two slides up front, one for macro (espresso versus drip) and one for micro (grind fineness once you've selected espresso). Second, it has timer presets, which, as I said earlier, is super-convenient. There's a rest for you to put the portafilter on while you grind, but you need to hold it there or it falls off. I notice that Baratza now makes a weight-based Vario W. This seems like a good idea, but I don't know how well it will work with espresso, since you don't want to grind into a hopper but right into your portafilter, and it's not clear how the scale integrates with that. One caution I would have with the Vario is that the really gross burr adjustments are done with a hex wrench (included). They're easy but kinda scary (keep turning until the motor starts to labor), so if that freaks you out, you might consider another choice.

Espresso Machine
There are a lot of choices in what kind of espresso machine you buy, but let's get something out of the way now: espresso machines have pumps. Yes, you can buy a cheap machine that works off steam pressure, but that's not what you want.

The central problem that dictates the design of an espresso machine is this: The water you use to make espresso needs to be at one temperature (~200 F). The water you use to steam your milk needs to be at steam temperatures (~250 F). If you're going to make milk drinks (I don't, but Mrs. G. does) then you need to somehow address this. There are four basic approaches that I've seen:

  • Have a single boiler and a switch that selects which temperature to maintain at (a single boiler machine).
  • Have two boilers, one at each temperature (a double boiler machine).
  • Have a boiler set to steam temperature and use a heat exchanger to heat your water to espresso temperature.
  • Have a boiler set to water temperature and an electric thermal block heating system to make steam.

Single boiler machines are basically a terrible solution for more than about one or two people if you want to make any kind of steamed milk drink. Here's what the procedure looks like if you want to make a latte: set the thermostat switch to "water"; pull a shot; set the thermostat switch to steam; wait for it to heat up; steam your milk. This is all reasonably fast because the boiler heats up fast. However, say you want to make another latte. Now you have to set the thermostat back to water and wait for it to cool down, which can take minutes. You can accelerate this some by just running water through the group head which pulls cool water out of the reservoir into the system, but basically it's a pain. I've used this kind of machine in an office setting and it sucks.

The obvious (and best) solution to this problem is to have two totally separate boilers, with one set to water and one set to steam. This is of course more expensive, especially since manufacturers seem to have decided to engage in a little market segmentation. To give you an example, Chris Coffee's cheapest double boiler is the Mini Vivaldi II at $1995. They'll sell you a Rancilio Silvia (a very nice single boiler) for $699. This isn't an uncommon pattern: many double boiler machines sell for more than twice what a good single boiler would cost. I don't know anyone who has bought two singles instead, but it's sure occurred to me.

The other two solutions are compromises. In a heat exchanger machine, the boiler is set to steam temperature and then the water for the espresso runs through a tube set inside the boiler, thus heating up on the way (good description here. The idea is that as the water is being pulled out of the reservoir and onto the coffee it heats up. The obvious problem, however, is that when you're not pulling espresso, the water in the heat exchanger tube is heating up eventually to the temperature of the steam, at which point you're back where you started, as is the heavy metal group head which provides a lot of thermal intertia. Standard procedure here is a cooling flush which means that you run some water through the (empty) portafilter/brew group to get it down below the right temperature. Then you quickly pack the portafilter and pull your shot. This all requires some coordination.

About a year ago, QuickMill came out with a new machine (the Silvano), which has a single boiler for the water and a thermoblock for the steam. This has the advantage that you can tightly temperature control the water and the group head and still get decent steam fast. The steam isn't as good as it would be if you had an actual boiler, but it's pretty good, so it's a reasonable compromise. And since the water side is temperature controlled, you get to pull a predictable shot without much messing around, which is what I, at least, am after. It shouldn't be surprising at this point that I have a Silvano, which I'm pretty happy with. Here's what it looks like pulling a shot of Four Barrel Ethiopia Welena Suke Quto (and no, those two little spurts onto the backsplash are not intended. That's evidence of tamping error.)

Oh, one more thing: the water supply for espresso machines can either be plumbed (there is a water tube coming from your pipes) or unplumbed (there is a water reservoir you have to refill). Plumbed typically only comes on higher end machines. I don't know if it's worth stepping up to one of those machines to get plumbed, but I do know that my Silvano is unplumbed and I wish it were plumbed. It's pretty annoying to have the shot already to go and realize you're out of water. Doubly annoying if it's your last shot worth of coffee.

 

November 29, 2011

As I wrote earlier, many oversubscribed races use a performance-based qualification process as a way of selecting participants. What I mostly passed over, however, is whether different people should have to meet different qualifying standards. If your goal is to get the best people, you could simply just pick the top X%. However, if you were to do that, what you would get would be primarily men in the 20-40 age range. To give you an idea of this, consider Ironman Canada 2011, which had 65 Hawaii Qualifiers. If you just take the first 65 non-Pro finishers, the slowest qualifier would be around 10:17. This standard would have two amateur women, Gillian Clayton (W30-34) at 10:01.58 (a pretty amazing performance, since she's 18 minutes ahead of the next woman) and Rachel Ross (W35-39) at 10:12.17, and no man 55 or above.

If you're going to have a diversified field, then, you need to somehow adjust the qualifying standard for age and gender. The standard practice is to have separate categories for men and women and five year age brackets within each gender. (Some races also have "athena" and "clydesdale" divisions for women and men respectively who are over a certain weight, but at least in triathlon, these are used only for awards and not for Hawaii qualifying purposes.) However, it's also well-known that these categories do a fairly imperfect job of capturing age-related variation: it's widely recognized that "aging up" from the oldest part of your age group to the youngest part of the next age group represents a signficant improvement in your expected results.

UPDATE: I forgot to mention. Western States 100 has a completely gender neutral qualifying standard, but it's comparatively very soft.

 

November 28, 2011

One of the common patterns in endurance and ultra-endurance sports is to have one or two races that everyone wants to do (the Hawaii Ironman, the Boston Marathon, Western States 100, etc.) Naturally, as soon as the sport gets popular you have more people who want to do race X than the race organizers can accomodate. [Interestingly, this seems to be true no matter the size of the event: Hawaii typically has around 1800 participants, Boston over 20,000.] As a race organizer, then, you are faced with the problem of deciding how to pick the actual participants from those who have the desire to participate.

The first problem seems to be deciding what to optimize for, with the two most common objectives being:

  • Choose fairly among everyone who wants to do the race.
  • Choose the best athletes.

Fair Selection
The easiest way to choose fairly is generally to run a lottery. You take applications for a race up until date X and then just draw however many entrants you want out of that list. [Note that there is always a yield issue, since some people register who never show because of injuries or whatever, so the number of actual participants is never totally predictable.] For races which are only mildly oversubscribed, what's more common is take entries up until you're full and then close entry under the "you snooze, you lose" principle. Ironman Canada does this, but now it basically fills up right away every year so you more or less have to be there the day after the race when registration for the next year opens up.

Merit-Based Selection
Choosing the best athletes is a more difficult proposition, since you first need to identify them. You might think that you could just have a big qualifying race with everyone who wants to race and just pick the top X participants, but this clearly isn't going to work. Since the size of the target event is generally (though not always) set to be about the maximum practical size of a race, if you're going to pick out the top people to race in your target event, the qualifying event would have to be much much larger, well beyond the practical size. Instead, you somehow have to have a set of qualifying races and draw the best candidates from each race. In some cases this is easy: If you are drawing national teams for the world championship, you can just have each nation run its own qualifying race and since each such race only needs to draw from a smaller pool, it's still manageable. However, many events (e.g., Ironman) aren't laid out among national lines so this doesn't work.

There are two basic strategies for drawing your qualifying candidates from a number of races. First, you can have a qualifying time. For instance, if I wanted to run the Boston Marathon, I would need to run some marathon under 3:10. Obviously, there is a lot of variation in how difficult any given race is, and so this leads to people forum shopping for the fastest race. It's extremely common to see marathons advertised as good Boston qualifiers. The key words here are "flat and fast" (A qualifying race can only have a very small amount of net downhill, so non-flat means uphill,which slows you down.). Obviously, a qualifying time doesn't give you very tight control over how many people you actually admit, so you still have an admissions problem. As I understand it, Boston used to just use a first-come-first-served policy for qualifiers but in 2012 they're moving towards a rolling admissions policy designed to favor the fastest entrants. That said, At the other end of the spectrum, the Western States has their qualifying time set so that there are vastly more qualifiers than eventual participants (it looks to me like it's set so that practically anyone who can finish can qualify [observation due to Cullen Jennings]) and they use a lottery to choose among the qualifiers.

The other major predictable approach is that used for the Hawaii Ironman. The World Triathlon Corporation (who runs Hawaii) has made certain races "Hawaii qualifiers" (my understanding is that a race pays for this privilege) and each race gets a specific number of slots for each gender/age combination. The way that this works is that if there are 5 slots in your age group, then the top 5 finishers get them. If any of those people don't want the slot (for instance they may have already qualified) then the slots roll down to the 6th person, and so on. all of this happens the day of or the day after the race and in person. This method gives the race organizer a very predictable race size but poses some interesting strategic issues for participants: because participants compete directly against each other for slots, what you want is to pick a qualifying race that looks like it is going to have a weak field this year. Unfortunately, just because a race had a weak field last year doesn't mean that that will be true again, since everyone else is making the same calculation!

Arbitrary Selection
One thing that I've only seen in ultrarunning is invitational events with arbitrary (or at least unpublished) selection criteria. For instance, here's the situation with Badwater:

The application submission period begins on February 1, 2012 and ends on February 15, 2012. A committee of five race staff members, one of whom is the race director, will then review and rank each application on a scale of 0 to 10. The ranks will be tallied on February 18 and the top 45 rookie applicants with the highest scores, and the top 45 veterans with the highest scores, will be invited (rookies and veterans compete separately for 45 slots for each category). At that time, or later, up to ten more applicants (rookie and/or veteran) may be invited at the race director's discretion, for a total of approximately 100 entrants, and 90 actual competitors on race day.

I guess that's one way to do it.