Recently in Overthinking Category

 

April 9, 2012

The IETF RTCWEB WG has been operating on a fast track with an interim meeting between each IETF meeting. Since we needed to schedule a lot of meetings, thought it might be instructive to try to analyze a bunch of different locations to figure out the best strategy. Here's a lightly edited version of my post to the RTCWEB WG trying to address this issue.

Note that I'm not trying to make any claims about what the best set of venues is. It's obviously easy to figure out any statistic we want about each proposed venue, but how you map that data to "best" is a much more difficult problem. The space is full of Pareto optima, and even if we ignore the troubling philosophical question of interpersonal utility comparisons, there's some tradeoff between minimal total travel time and a "fair" distribution of travel times (or at least an even distribution).

METHODOLOGY
The data below is derived by treating both people and venues as airport locations and using travel time as our primary instrument.

  1. For each responder for the current Doodle poll, assign a home airport based on their draft publication history. We're missing a few people but basically it should be pretty complete. Since these people responded before the venue is known, it's at least somewhat unbiased.
  2. Compute the shortest advertised flight between each home airport and the locations for each venue by looking at the shortest advertised Kayak flights around one of the proposed interim dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares". [Thanks to Martin Thomson or helping me gather these.]

This lets us compute statistics for any venue and/or combination of venues, based on the candidate attendee list.

The three proposed venues:

  • San Francisco (SFO)
  • Boston (BOS)
  • Stockholm (ARN)

Three hubs not too distant from the proposed venues:

  • London (LHR)
  • Frankfurt (FRA)
  • New York (NYC) (treating all NYC airports as the same location)
Also, Calgary (YYC), since the other two chair locations (BOS and SFO) were already proposed as venues, and I didn't want Cullen to feel left out.

RESULTS
Here are the results for each of the above venues, measured in total hours of travel (i.e., round trip).

Venue         Mean         Median           SD
----------------------------------------------
SFO           13.5             11         12.2
BOS           12.3             11          7.5
ARN           17.0             21         10.7
FRA           14.8             17          7.3
LHR           13.3             14          7.5
NYC           11.5             11          5.8
YYC           14.9             13         10.2
SFO/BOS/ARN   14.3             13          3.6
SFO/NYC/LHR   12.7             11.3        3.7
XXX/YYY/ZZZ is a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean and median are intended to be some sort of aggregate measure of travel time. I don't have any way to measure "fairness", but SD is intended as some metric of the variation in travel time between attendees.

The raw data and software are attached. The files are:

home-airports: the list of people's home airports
durations.txt: the list of airport-airport durations
doodle.txt: the attendees list
pairings: the software to compute travel times
doodle-out.txt -- the computed travel times for each attendee

This was a quick hack, so there may be errors here, but nobody has pointed out any yet.

OBSERVATIONS
Obviously, it's hard to know what the optimal solution is without some model for optimality, but we can still make some observations based on this data:

  • If we're just concerned with minimizing total travel time, then we would always in New York, since it has both the shortest mean travel time and the shortest median travel time, but as I said above, this arguably isn't fair to people who live either in Europe or California, since they always have to travel.
  • Combining West Coast, East Coast, and European venues has comparable (or at least not too much worse) mean/median values than NYC with much lower SDs. So, arguably that kind of mix is more fair.
  • There's a pretty substantial difference between hub and non-hub venues. In particular, LHR has a median travel time 7 hours less than ARN, and the SFO/NYC/LHR combination has a median/mean travel time about 2 hours less than SFO/BOS/ARN (primarily accounted for by the LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs here, but you'd probably get similar results if, for instance, you used AMS instead of LHR.]
  • Obviously, your mileage may vary based on your location and feelings about what's fair, but based on this data, it looks to me like a three-way rotation between West Coast, East Coast, and European hubs offers a good compromise between minimum cost and a flat distribution of travel times.

     

    August 22, 2011

    The process of turning raw wool into fabric by hand is extremely time consuming. Prior to the Industrial Revolution, the production process operated in a pyramid, with a large number of carders supported a smaller number of spinners, supporting an even smaller number of weavers [Note: weaving is much faster than the other two major technqiues for turning yarn into cloth: knitting and crocheting]. I've heard varying numbers, but Wikipedia claims that the ratio was around 9:3:1.

    Isn't it interesting, then, that when you look at the list of common American surnames, which are often associated with occupations, that "Weaver" appears at position 190 (.05% of the population) but "Spinner" appears at 1/50th the rate, at position 7393 (.001%). Carder is at 4255 (.003%); Carter is, I would assume, a different profession. [The first 10 names, btw are: Smith, Johnson, Williams, Jones, Brown, Davis, Miller, Wilson, Moore, Taylor].

    I'm not attempting to claim that there's some direct relationship between last name frequency and historical occupation rates, but it's still entertaining to speculate on the cause. My initial suggestion was that carding and spinning were more likely to be women's work and of course in the West women's surnames don't get propagated. Mrs. Guesswork suggests that spinning and carding weren't professionalized the way that weaving was [prior to the invention of the spinning wheel, spinning technology was extremely low-tech], so you might spin or card in your spare time, but weaving requires enough capital equipment that you would expect it to be done professionally and thus be more likely to get a surname attached to it.

    Equally likely, of course, is that it's just coincidence, but what fun would that be?

     

    March 5, 2011

    As I've mentioned before, a world with a lot of vampires is a world with a blood supply problem. I recently watched Daybreakers, which takes this seriously; nearly everyone in the world is a vampire and the vampires farm most of the remaining humans for blood while sending out undeath squads to round up the rest. Obviously, this isn't a scalable proposition and sure enough the vampires are frantically trying to develop some kind of substitute for human blood before supplies run out.

    In a world where synthetic blood isn't possible, there's some maximum stable fraction of vampires, dictated by the maximum amount of blood that a non-vampire can produce divided by the amount of blood that a vampire needs to survive. According to Wikipedia blood donations are typically around 500ml and you can donate every two months or so. This works out to about 3 liters of blood per donor per year. Presumably, if you didn't mind doing some harm to the donors (e.g., if it's involuntary), you could get a bit more, but this still gives us a back of the envelope estimate. I have no idea what vampires need, but if it's say a liter a day, then this tells you that any more than about 1% of the population being vampires is unstable. This is of course a classic externality problem, since being a vampire is cool, but not everyone can be a vampire. If we wish to avoid over-bleeding, they will need some sort of system to avoid creating new vampires.

    Luckily, this is a relatively well understood economics problem with a well-known solution: we simply set a hard limit on the number of vampires and then auction off the rights (cap-and-trade won't work well unless we have some way of turning vampires back into ordinary humans). I'd expect this to raise a lot of money which we can then plow into synthetic research to hasten the day when everyone can be a vampire; either that or research into better farming methods the better to hasten the red revolution.

     

    January 9, 2010

    I'm in the market for a new motorcycle and have been looking at the BMW R1150GS/R1200GS. Like cars, motorcycles have a lot of depreciation the minute they pull off the lot, and because you're fairly likely to drop your bike anyway, most people I know figure you might as well buy pre-dropped and look for a used model. But once you're buying used you have the problem of figuring out how much you should pay. KBB motorcycles isn't much help here because the market is small and the mileage varies a lot.

    An alternate approach is to mine the available data on what people are offering vehicles for and use this to build an analytical model for predicting prices; this lets us figure out what the appropriate asking (which isn't the same as fair; more on this later) price for a new vehicle is and identify outliers in either direction.

    Below, you can find the list of the relevant bikes on sale on CL for the past week or so:

    Asking Model Year Mileage
    1 7650 1150GS 2002 25000
    2 7900 1150GS 2001 54000
    3 14500 1200GSA 2006 3700
    4 8500 1200GS 2005 54000
    5 13700 1200GS 2007 3658
    6 7400 1150GSA 2004 60000
    7 5500 1100GS 1996 23000
    8 11500 1200GS 2005 12000
    9 7200 1150GS 2002 40000
    10 11950 1200GS 2008 29000
    11 9600 1200GS 2005 39000

    I used a simple OLS regression model to fit this data, using the model year and mileage for the bike. The result is:

    summary(fit2)
    
    Call:
    lm(formula = d2$Asking ~ d2$Year + d2$Mileage)
    
    Residuals:
          Min        1Q    Median        3Q       Max 
    -1360.040  -353.520  -150.358     2.140  1708.510 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -1.201e+06  1.889e+05  -6.359 0.000218 ***
    d2$Year      6.056e+02  9.423e+01   6.426 0.000203 ***
    d2$Mileage  -7.631e-02  1.578e-02  -4.836 0.001294 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
    
    Residual standard error: 975.1 on 8 degrees of freedom
    Multiple R-squared: 0.9108,	Adjusted R-squared: 0.8885 
    F-statistic: 40.84 on 2 and 8 DF,  p-value: 6.335e-05 
    

    Our model predicts that each year the bike is on the road it loses about $600 in value and that it loses about $76 for each 1000 miles it has. [Note that I'm treating mileage and age as independent variables; it might make more sense to try to estimate "excess" mileage over some base value, but I don't have the baseline data I would need.] In any case, we're doing pretty well here: with only two predictors we are accounting for around 90% of the price variation. We can see this visually by plotting the price points against the best fit plane, as below:

    s3d <- scatterplot3d(d2$Asking~d2$Year+d2$Mileage,xlab="Year",ylab="Mileage",zlab="Asking")
    orig <- s3d$xyz.convert(d2$Year,d2$Mileage,d2$Asking)
    plane <- s3d$xyz.convert(d2$Year,d2$Mileage,fitted(fit))
    i.negpos <- 1 + (resid(fit)>0)
    segments(orig$x,orig$y, plane$x,plane$y, col=c("blue","red")[i.negpos],lty=(2:1)[i.negpos])
    s3d$plane3d(fit)
    
    (code ripped off from here).

    Points above the plane (shown with red lines) are likely too expensive and points below (with blue lines) are worth checking out to see if they're good deals.

    Obviously, we're excluding a lot of variables here. We haven't captured the condition of the bike, how desperate/motivated the seller is to get rid of it, what accessories it has, etc. Looking more closely at the data, the two most comparatively expensive bikes seem to come with a few more accessories, so this may have led the owners to think they could extract more money (I don't think this is really true, however, since often those items are valuable only to the original owner). For the purposes of selecting good deals, we would also like to know how flexible the seller's price is. It's possible that someone lowballing the price will also be less flexible because they've already built that discount into their price. On the other hand, they could be more motivated, so that could cut in the other direction. It would be interested to get secondary data on how much these bikes actually sell for [you could get some of that information by seeing if repeated postings have lower prices], but while that data is available for houses I don't think it is for bikes.

     

    July 5, 2009

    The problem with climbing grades is that unlike running, cycling, lifting, etc. there's no objective measure of difficulty. Routes are just graded by consensus of other climbers, in this case the gym's routesetters. As a result, some routes are easier than others—and of course since different climbers have different styles, which routes are easiest depends on the climber as well—and as a practical matter some routes are really harder or easier than their rated grade.1 Of course, given that there's no objective standard, you could argue that this isn't a meaningful statement, but that's not really true: a difficulty grade is really a statement about how many people can do a route, so if you have a bunch of routes which are rated at 5.10 and I can't climb any of them but I jump on a new route rated 5.10, and race up it with no effort, that's a sign it's not really a 5.10. This is actually a source of real angst to people just starting to break into a grade—at least for me—since if I can do it, I immediately expect that the rating is soft.

    It would be nice to have a more objective measurement of difficulty. While we can't do this just by measuring the route (the way we can with running, for instance) that doesn't mean the problem is insoluble; we just need to take a more sophisticated approach. Luckily, we can steal a solution from another problem domain: psychological testing. The situations are actually fairly similar: in both cases we have a trait (climbing skill, intelligence) which isn't directly measurable. Instead, we can give our subjects a bunch of problems which are generally easier the higher your level of ability. In the psychological domain, what we want to do is evaluate people's level of ability; in the climbing domain, we want to evaluate the level of difficulty of the problems. With the right methods, it turns out that these are more or less the same problem.

    The technique we want is called Item Response Theory (IRT). IRT assumes that each item (question on the test or route, as the case may be) has a certain difficulty level; if you succeed on an item, that's an indication that your ability is above that level. If you fail, that's an indication that your ability is below that level. Given a set of items of known difficulties, then, we can can quickly home in on someone's ability, which is how computerized adaptive tests work. Similarly, if we take a small set of people of known abilities and their performance on each item, we can use that to fit the parameters for those items.

    It's typical to assume that the probability of success on each item is a logistic curve. The figure below shows an item with difficulty level 1.

    Of course, this assumes that we already know how difficult the items are, but initially we don't know anything: we just have a set of people and items without any information about how good/difficult any of them are. In order to do the initial calibration we start by collecting a large, random sample of people and have them try each item. You end up with a big matrix of each person and whether they succeeded or failed at each one, but since you don't know how good anyone is other than by the results of this test, things get a little complicated. The basic idea behind at least one procedure, due to Birnbaum, (it's not entirely clear to me if this is how modern software works; the R ltm documentation is a little opaque) is to use an iterative technique where you assign an initial set of abilities to each person and then use that to estimate the difficulty of each problem. Given those assignments, we can re-fit to determine people's abilities. You then use those estimates to reestimate the problem difficulties and iterate back and forth until the estimates converge, at which point you have estimate of both the difficulty of each item and the ability of each individual. (My description here is based on Baker).

    As an example I generated some toy data with 20 items and 100 subjects with a variety of abilities and fit it using R's ltm package. The figure below shows the results with the response curves for each item. As you can see, having a range of items with different difficulties lets us evaluate people along a wide range of abilities:

    Once you've done this rather expensive calibration stage, however, you can easily calculate someone's abilities just by plugging in their performance on a small set of items. Actually, you can do better than that: you can perform an adaptive test where you start with an initial set of items and then use the response on those items to determine which items to use next, but even if you don't do this, you can get results fairly quickly.

    That's nice if you're administering the SATs, but remember that what we wanted was to solve the opposite problem: rating the items, not the subjects. However, as I said earlier, these are the same problem. Once we have a set of subjects with known abilities, we can use that to roughly calibrate the difficulty of any new set of items/routes. So, the idea is that we create some set of benchmark routes and then we send our raters out to climb those routes. At that point we know their ability level and can use that to rate any new set of climbs.

    There's still one problem to solve: the difficulty ratings we get out of our calculations are just numbers along some arbitrary range (it's conventional to aim for a range of about -3 to +3 with the average around 0), but we want to have ratings in the Yosemite Decimal System (5.1-5.15a as of now). It's of course easy to rescale the difficulty parameter to match any arbitrary scale of our choice, but that's not really enough, because the current ratings are so imprecise. We'll almost certainly find that there are two problems A and B where A is currently rated harder than B but our calibrated scale has B harder than A. We can of course choose a mapping that minimizes these errors, but because so many routes are misrated it's probably better to start with a smaller set of benchmark routes where there is a lot of consensus on their difficulty, make sure they map correctly, and then readjust the ratings of the rest of the routes accordingly.

    Note that this doesn't account for the fact that problems can be difficult in different ways; one problem might require a lot of strength and one require a lot of balance. To some extent, this is dealt with by the having a smooth success curve which doesn't require that every 5.10 climber be able to climb every 5.10 route. However, ultimately if you have a single scalar ability/difficulty metric, there's only so much you can do in this regard. IRT can handle multiple underlying abilities, but the YSD scale we're trying to emulate can't, so there's not too much we can do along those lines.

    Obviously, this is all somewhat speculative—it's a lot of work and I don't get the impression that route setters worry too much about the accuracy of their ratings. On the other hand, at least in climbing gyms if you were able to integrate it into a system that let people keep track of their success in their climbs (I do this already but most people find it to be too much trouble), you might be able to get the information you needed to calibrate new climbers and through them get a better sense of the ratings for new climbs.

    Acknowledgement: This post benefitted from discussions with Leslie Rescorla, who initially suggested the IRT direction.

    1. This seems to be especially bad for very easy and very hard routes. I think the issue with very easy routes is that routesetters are generally good climbers and so find all the routes super-easy. I'm not sure about harder problems, but it may be that they're near the limit of routesetters abilities and so heavily dependent on whether the route matches their style.

     

    March 27, 2009

    Sorry about the lack of content last week—was at IETF and just didn't have time to write anything. I should have some more material up over the weekend. In the meantime, check out this photo of the bathroom sink at the Hilton where we were having the conference:

    That thing to the left of the sink is an automatic soap dispenser (surprisingly, powered by a battery pack underneath the sink). Now notice that the sink itself is manually operated. Isn't this kind of backwards? The whole point of automatic soap dispensers and sinks in bathrooms is to appeal to your OCD by freeing you from having to touch any surface which has been touched by any other human without being subsequently sterilized. But when you wash your hands, the sequence of events is that you turn on the water, wet your hands, soap up, rinse, and then turn off the water. So, if you have a manually operated sink, people contaminate the handles with their dirty, unwashed hands, which means that when you go to turn the sink off, your just-washed hands get contaminated again. The advantage of automatic faucets, then, is the automatic shutoff, which omits the last stage. By contrast, having the soap dispenser be automatic doesn't buy you that much because you only need to touch it before washing your hands. There's probably some analogy here to viral spread in computer systems, but for now let's just say that this is how security guys think.

     

    February 6, 2009

    Dan Savage addresses the difficult ethical issue of the mutual obligations of the laptop user and the coffee shop in which he works:
    Don't want people to sit in your cafe with their laptops? There's a simple solution: don't have WiFi. But if you're going to have WiFi then for fuck's sake have fucking WiFi. And if your WiFi isn't working, if it's down and it's gonna be down all day, you might wanna mention that to people before they wait in line, buy a coffee, leave a tip, sit down, and pull out their computers. Because then each and every one of those computer users is going to walk up to the counter and ask if you have WiFi. It's an asshole move to look at each laptop computer user/customer in turn like they've just asked you if you have herpes. And if it really kills you to sneer out, "Yeah, we have WiFi, but it's down," then put a little sign on the door that says the WiFi's out. Then laptop users won't bother you with their questions, their presence, or their patronage.

    UPDATE: And laptop users? Tip based on the amount of time you intend to spend in the cafe, not on the price your beverage; buy your refills; share tables; and always remember that you're not actually in your office.

    I occasionally work in coffee shops, so this is a topic I've given some thought to. I think it's pretty clear that there's some implicit obligation for patrons to fork over some money occasionally and not just sit at a table (yes, yes, I realize that there's no contract requiring you to do so, but think about the equilibrium issues here: if nobody ever paid for their drinks you can bet that coffee shops would start forcing you to rent tables.) But this doesn't tell you how much to spend or how to allocate your payments between the coffee shop and the staff.

    If the shop is pretty full, I think it's reasonably clear: you're depriving the shop of space that could be used by paying customers so you should be buying a bit more than the average customer. The same logic holds for the staff, since presumably those customers would tip. If the shop is mostly empty, though, the situation seems a little more complicated. You're not costing the shop any money and WiFi is basically free for the shop to offer (the router is cheap and the Internet service is a fixed cost.) That doesn't mean you don't need to fork over any money, since, as I said, there's an implicit obligation, but I have no idea what the right amount is. I usually buy a drink when I come in and then maybe one every hour or two. It's not clear how much to tip the staff either: their work scales with the number of drinks you order, so my instinct is whatever fraction of your food and drinks you usually would tip.

    As far as the shop's obligation to you, the flip side of the implicit contract is that they will offer you Wi-Fi ("Wait", I hear you object, "why should you even think they have Wi-Fi, let alone rely on it?" That seems simple: some coffee shops advertise it and even in shops which don't many if not most of the customers are regulars and so know it's provided and often went to the shop explicitly to work.). Obviously, that doesn't mean it needs to work perfectly, but if they know it's hosed they should probably tell you before you've plonked down your money.

     

    January 30, 2009

    While listening to KQED's latest pledge drive, I noticed something funny about their thank you gift schedule. This time, they offered the option to have you not take any gift but instead donate it to the SF Food Bank.. The schedule looks like this:

    Donation ($) Meals
    40 2
    60 5
    144 33
    360 180

    This seems strangely non-linear, which suggests something interesting, namely, that the fraction of your pledge that KQED uses to pay for thank you gifts as opposed to using to fund their operations. There's way too few points here to do a proper fit but I can't help myself. Playing around with curves a bit, a quadratic seems to fit pretty well, with parameters: Meals = .0014 * Donation^2 + 1.2. It's not just the $360 data point that throws it out of whack, either. There's apparent nonlinearity, even in the first three points. (Again, don't get on me about overfitting: with only four points there's only so much you can do.)

    I'm not sure what this suggests about their business model. Naively, I would have expected the fraction of your donation that goes to gifts to go down as your gift went up. Indeed, you might have thought that they would take a small loss on the smallest pledges just to get people involved and then move to the upsell at some later date. Thinking about it some more, I guess the natural model is that KQED as trying to extract money from you up to the point where the marginal dollar they extract from you costs them a marginal dollar in gifts (or in this case food bank donations) at which point they stop. So, as people's marginal utility of having given something, anything, to KQED declines, they need to keep jacking up gift quality faster than the size of the donation to keep extracting your cash. Other theories are of course welcome.

     

    January 19, 2009

    Mrs. G. and I were up in San Francisco last weekend and while on our way to Fog City News we ran into someone we knew. This was sort of surprising, so I got to thinking about how probable it was (or wasn't). Grossly oversimplifying, my reasoning goes something like this:

    The population of San Francisco is about 800,000. Let's call it 10^6. I know perhaps 100 people in the city at any given time. There are maybe 20-50 people on any given stretch of city block. Say I walk for an hour at 3 mph and that the average block is 100m long, so I walk for 50 blocks in that time and pass on the order of 10^{3} people. If we assume people are randomly distributed (this is probably pessimistic, since I know that I spend most of my time in SF in a few places and I assume my friends tend to be somewhat similar) then I have a .9999 chance of not knowing any given person I run into. If we assume that these are independent events then I have a .9999^{1000} chance of not knowing any of those people [technical note: this is really (999900/1000000) * (9998999/999999) * ..., but these numbers are large enough and we've made enough other approximations that we can ignore this.] .9999^1000 = .90 so if I walk around the city for an hour, I have about a 1/10 chance of meeting someone I know. That doesn't sound too far out of line.

     

    December 29, 2008

    One of Slate's odder sections is the "Green Lantern", where they take on some simple question like "should I buy a natural or artificial Christmas Tree" and try to analyze it from an environmental perspective. The most recent article asks whether you should throw away your leftovers or flush them down the garbage disposal. Unfortunately, the articles tend to be pretty useless: sometimes they have a real answer but often they thrash around for a while giving you the pros and cons of each option and conclude that maybe you should do A and maybe you should do B:
    The research is unambiguous about one point, though: Under normal circumstances, you should always compost if you can. Otherwise, go ahead and use your garbage disposal if the following conditions are met: First, make sure that your community isn't running low on water. (To check your local status, click here.) Don't put anything that is greasy or fatty in the disposal. And find out whether your local water-treatment plant captures methane to produce energy. If it doesn't--and your local landfill does--you may be better off tossing those mashed potatoes in the trash.

    Or maybe not... Here's another example:

    If these ideas don't excite you, the Lantern recommends putting the new cash toward insulating your family's home. Of course, whether this makes sense depends on your local climate and whether you buy or rent. (Likewise, the current state of your home will determine just how much insulation your $100 will buy.) For the rest of you, it might be wisest to replace any antiquated, energy-inefficient appliances you might have--along the lines spelled out here. (Let's put aside the complicated question of carbon offsets, which will be addressed in a future column. Suffice to say that they wouldn't be the Lantern's first choice.)

    I'm not saying I can do any better; rather I think this is reflective of a systemic problem with this kind of overall cost/benefit analysis. While it's possible to measure the power consumption, carbon emissions, etc. of any particular microactivity, it's pretty hard to do an overall cost/benefit analysis of whether you should do A or B when each of them consists of a whole bunch of individual activities, all of which require their own analyses. The economist type answer is to levy Pigouvian taxes on each individual component (e.g., carbon taxes) and then let the market sort things out. I don't know if that would work any better, though, but I don't see people being able to do this kind of analysis for each individual purchasing decision either.