Recently in Overthinking Category

 

January 9, 2010

I'm in the market for a new motorcycle and have been looking at the BMW R1150GS/R1200GS. Like cars, motorcycles have a lot of depreciation the minute they pull off the lot, and because you're fairly likely to drop your bike anyway, most people I know figure you might as well buy pre-dropped and look for a used model. But once you're buying used you have the problem of figuring out how much you should pay. KBB motorcycles isn't much help here because the market is small and the mileage varies a lot.

An alternate approach is to mine the available data on what people are offering vehicles for and use this to build an analytical model for predicting prices; this lets us figure out what the appropriate asking (which isn't the same as fair; more on this later) price for a new vehicle is and identify outliers in either direction.

Below, you can find the list of the relevant bikes on sale on CL for the past week or so:

Asking Model Year Mileage
1 7650 1150GS 2002 25000
2 7900 1150GS 2001 54000
3 14500 1200GSA 2006 3700
4 8500 1200GS 2005 54000
5 13700 1200GS 2007 3658
6 7400 1150GSA 2004 60000
7 5500 1100GS 1996 23000
8 11500 1200GS 2005 12000
9 7200 1150GS 2002 40000
10 11950 1200GS 2008 29000
11 9600 1200GS 2005 39000

I used a simple OLS regression model to fit this data, using the model year and mileage for the bike. The result is:

summary(fit2)

Call:
lm(formula = d2$Asking ~ d2$Year + d2$Mileage)

Residuals:
      Min        1Q    Median        3Q       Max 
-1360.040  -353.520  -150.358     2.140  1708.510 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.201e+06  1.889e+05  -6.359 0.000218 ***
d2$Year      6.056e+02  9.423e+01   6.426 0.000203 ***
d2$Mileage  -7.631e-02  1.578e-02  -4.836 0.001294 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 975.1 on 8 degrees of freedom
Multiple R-squared: 0.9108,	Adjusted R-squared: 0.8885 
F-statistic: 40.84 on 2 and 8 DF,  p-value: 6.335e-05 

Our model predicts that each year the bike is on the road it loses about $600 in value and that it loses about $76 for each 1000 miles it has. [Note that I'm treating mileage and age as independent variables; it might make more sense to try to estimate "excess" mileage over some base value, but I don't have the baseline data I would need.] In any case, we're doing pretty well here: with only two predictors we are accounting for around 90% of the price variation. We can see this visually by plotting the price points against the best fit plane, as below:

s3d <- scatterplot3d(d2$Asking~d2$Year+d2$Mileage,xlab="Year",ylab="Mileage",zlab="Asking")
orig <- s3d$xyz.convert(d2$Year,d2$Mileage,d2$Asking)
plane <- s3d$xyz.convert(d2$Year,d2$Mileage,fitted(fit))
i.negpos <- 1 + (resid(fit)>0)
segments(orig$x,orig$y, plane$x,plane$y, col=c("blue","red")[i.negpos],lty=(2:1)[i.negpos])
s3d$plane3d(fit)
(code ripped off from here).

Points above the plane (shown with red lines) are likely too expensive and points below (with blue lines) are worth checking out to see if they're good deals.

Obviously, we're excluding a lot of variables here. We haven't captured the condition of the bike, how desperate/motivated the seller is to get rid of it, what accessories it has, etc. Looking more closely at the data, the two most comparatively expensive bikes seem to come with a few more accessories, so this may have led the owners to think they could extract more money (I don't think this is really true, however, since often those items are valuable only to the original owner). For the purposes of selecting good deals, we would also like to know how flexible the seller's price is. It's possible that someone lowballing the price will also be less flexible because they've already built that discount into their price. On the other hand, they could be more motivated, so that could cut in the other direction. It would be interested to get secondary data on how much these bikes actually sell for [you could get some of that information by seeing if repeated postings have lower prices], but while that data is available for houses I don't think it is for bikes.

 

July 5, 2009

The problem with climbing grades is that unlike running, cycling, lifting, etc. there's no objective measure of difficulty. Routes are just graded by consensus of other climbers, in this case the gym's routesetters. As a result, some routes are easier than others—and of course since different climbers have different styles, which routes are easiest depends on the climber as well—and as a practical matter some routes are really harder or easier than their rated grade.1 Of course, given that there's no objective standard, you could argue that this isn't a meaningful statement, but that's not really true: a difficulty grade is really a statement about how many people can do a route, so if you have a bunch of routes which are rated at 5.10 and I can't climb any of them but I jump on a new route rated 5.10, and race up it with no effort, that's a sign it's not really a 5.10. This is actually a source of real angst to people just starting to break into a grade—at least for me—since if I can do it, I immediately expect that the rating is soft.

It would be nice to have a more objective measurement of difficulty. While we can't do this just by measuring the route (the way we can with running, for instance) that doesn't mean the problem is insoluble; we just need to take a more sophisticated approach. Luckily, we can steal a solution from another problem domain: psychological testing. The situations are actually fairly similar: in both cases we have a trait (climbing skill, intelligence) which isn't directly measurable. Instead, we can give our subjects a bunch of problems which are generally easier the higher your level of ability. In the psychological domain, what we want to do is evaluate people's level of ability; in the climbing domain, we want to evaluate the level of difficulty of the problems. With the right methods, it turns out that these are more or less the same problem.

The technique we want is called Item Response Theory (IRT). IRT assumes that each item (question on the test or route, as the case may be) has a certain difficulty level; if you succeed on an item, that's an indication that your ability is above that level. If you fail, that's an indication that your ability is below that level. Given a set of items of known difficulties, then, we can can quickly home in on someone's ability, which is how computerized adaptive tests work. Similarly, if we take a small set of people of known abilities and their performance on each item, we can use that to fit the parameters for those items.

It's typical to assume that the probability of success on each item is a logistic curve. The figure below shows an item with difficulty level 1.

Of course, this assumes that we already know how difficult the items are, but initially we don't know anything: we just have a set of people and items without any information about how good/difficult any of them are. In order to do the initial calibration we start by collecting a large, random sample of people and have them try each item. You end up with a big matrix of each person and whether they succeeded or failed at each one, but since you don't know how good anyone is other than by the results of this test, things get a little complicated. The basic idea behind at least one procedure, due to Birnbaum, (it's not entirely clear to me if this is how modern software works; the R ltm documentation is a little opaque) is to use an iterative technique where you assign an initial set of abilities to each person and then use that to estimate the difficulty of each problem. Given those assignments, we can re-fit to determine people's abilities. You then use those estimates to reestimate the problem difficulties and iterate back and forth until the estimates converge, at which point you have estimate of both the difficulty of each item and the ability of each individual. (My description here is based on Baker).

As an example I generated some toy data with 20 items and 100 subjects with a variety of abilities and fit it using R's ltm package. The figure below shows the results with the response curves for each item. As you can see, having a range of items with different difficulties lets us evaluate people along a wide range of abilities:

Once you've done this rather expensive calibration stage, however, you can easily calculate someone's abilities just by plugging in their performance on a small set of items. Actually, you can do better than that: you can perform an adaptive test where you start with an initial set of items and then use the response on those items to determine which items to use next, but even if you don't do this, you can get results fairly quickly.

That's nice if you're administering the SATs, but remember that what we wanted was to solve the opposite problem: rating the items, not the subjects. However, as I said earlier, these are the same problem. Once we have a set of subjects with known abilities, we can use that to roughly calibrate the difficulty of any new set of items/routes. So, the idea is that we create some set of benchmark routes and then we send our raters out to climb those routes. At that point we know their ability level and can use that to rate any new set of climbs.

There's still one problem to solve: the difficulty ratings we get out of our calculations are just numbers along some arbitrary range (it's conventional to aim for a range of about -3 to +3 with the average around 0), but we want to have ratings in the Yosemite Decimal System (5.1-5.15a as of now). It's of course easy to rescale the difficulty parameter to match any arbitrary scale of our choice, but that's not really enough, because the current ratings are so imprecise. We'll almost certainly find that there are two problems A and B where A is currently rated harder than B but our calibrated scale has B harder than A. We can of course choose a mapping that minimizes these errors, but because so many routes are misrated it's probably better to start with a smaller set of benchmark routes where there is a lot of consensus on their difficulty, make sure they map correctly, and then readjust the ratings of the rest of the routes accordingly.

Note that this doesn't account for the fact that problems can be difficult in different ways; one problem might require a lot of strength and one require a lot of balance. To some extent, this is dealt with by the having a smooth success curve which doesn't require that every 5.10 climber be able to climb every 5.10 route. However, ultimately if you have a single scalar ability/difficulty metric, there's only so much you can do in this regard. IRT can handle multiple underlying abilities, but the YSD scale we're trying to emulate can't, so there's not too much we can do along those lines.

Obviously, this is all somewhat speculative—it's a lot of work and I don't get the impression that route setters worry too much about the accuracy of their ratings. On the other hand, at least in climbing gyms if you were able to integrate it into a system that let people keep track of their success in their climbs (I do this already but most people find it to be too much trouble), you might be able to get the information you needed to calibrate new climbers and through them get a better sense of the ratings for new climbs.

Acknowledgement: This post benefitted from discussions with Leslie Rescorla, who initially suggested the IRT direction.

1. This seems to be especially bad for very easy and very hard routes. I think the issue with very easy routes is that routesetters are generally good climbers and so find all the routes super-easy. I'm not sure about harder problems, but it may be that they're near the limit of routesetters abilities and so heavily dependent on whether the route matches their style.

 

March 27, 2009

Sorry about the lack of content last week—was at IETF and just didn't have time to write anything. I should have some more material up over the weekend. In the meantime, check out this photo of the bathroom sink at the Hilton where we were having the conference:

That thing to the left of the sink is an automatic soap dispenser (surprisingly, powered by a battery pack underneath the sink). Now notice that the sink itself is manually operated. Isn't this kind of backwards? The whole point of automatic soap dispensers and sinks in bathrooms is to appeal to your OCD by freeing you from having to touch any surface which has been touched by any other human without being subsequently sterilized. But when you wash your hands, the sequence of events is that you turn on the water, wet your hands, soap up, rinse, and then turn off the water. So, if you have a manually operated sink, people contaminate the handles with their dirty, unwashed hands, which means that when you go to turn the sink off, your just-washed hands get contaminated again. The advantage of automatic faucets, then, is the automatic shutoff, which omits the last stage. By contrast, having the soap dispenser be automatic doesn't buy you that much because you only need to touch it before washing your hands. There's probably some analogy here to viral spread in computer systems, but for now let's just say that this is how security guys think.

 

February 6, 2009

Dan Savage addresses the difficult ethical issue of the mutual obligations of the laptop user and the coffee shop in which he works:
Don't want people to sit in your cafe with their laptops? There's a simple solution: don't have WiFi. But if you're going to have WiFi then for fuck's sake have fucking WiFi. And if your WiFi isn't working, if it's down and it's gonna be down all day, you might wanna mention that to people before they wait in line, buy a coffee, leave a tip, sit down, and pull out their computers. Because then each and every one of those computer users is going to walk up to the counter and ask if you have WiFi. It's an asshole move to look at each laptop computer user/customer in turn like they've just asked you if you have herpes. And if it really kills you to sneer out, "Yeah, we have WiFi, but it's down," then put a little sign on the door that says the WiFi's out. Then laptop users won't bother you with their questions, their presence, or their patronage.

UPDATE: And laptop users? Tip based on the amount of time you intend to spend in the cafe, not on the price your beverage; buy your refills; share tables; and always remember that you're not actually in your office.

I occasionally work in coffee shops, so this is a topic I've given some thought to. I think it's pretty clear that there's some implicit obligation for patrons to fork over some money occasionally and not just sit at a table (yes, yes, I realize that there's no contract requiring you to do so, but think about the equilibrium issues here: if nobody ever paid for their drinks you can bet that coffee shops would start forcing you to rent tables.) But this doesn't tell you how much to spend or how to allocate your payments between the coffee shop and the staff.

If the shop is pretty full, I think it's reasonably clear: you're depriving the shop of space that could be used by paying customers so you should be buying a bit more than the average customer. The same logic holds for the staff, since presumably those customers would tip. If the shop is mostly empty, though, the situation seems a little more complicated. You're not costing the shop any money and WiFi is basically free for the shop to offer (the router is cheap and the Internet service is a fixed cost.) That doesn't mean you don't need to fork over any money, since, as I said, there's an implicit obligation, but I have no idea what the right amount is. I usually buy a drink when I come in and then maybe one every hour or two. It's not clear how much to tip the staff either: their work scales with the number of drinks you order, so my instinct is whatever fraction of your food and drinks you usually would tip.

As far as the shop's obligation to you, the flip side of the implicit contract is that they will offer you Wi-Fi ("Wait", I hear you object, "why should you even think they have Wi-Fi, let alone rely on it?" That seems simple: some coffee shops advertise it and even in shops which don't many if not most of the customers are regulars and so know it's provided and often went to the shop explicitly to work.). Obviously, that doesn't mean it needs to work perfectly, but if they know it's hosed they should probably tell you before you've plonked down your money.

 

January 30, 2009

While listening to KQED's latest pledge drive, I noticed something funny about their thank you gift schedule. This time, they offered the option to have you not take any gift but instead donate it to the SF Food Bank.. The schedule looks like this:

Donation ($) Meals
40 2
60 5
144 33
360 180

This seems strangely non-linear, which suggests something interesting, namely, that the fraction of your pledge that KQED uses to pay for thank you gifts as opposed to using to fund their operations. There's way too few points here to do a proper fit but I can't help myself. Playing around with curves a bit, a quadratic seems to fit pretty well, with parameters: Meals = .0014 * Donation^2 + 1.2. It's not just the $360 data point that throws it out of whack, either. There's apparent nonlinearity, even in the first three points. (Again, don't get on me about overfitting: with only four points there's only so much you can do.)

I'm not sure what this suggests about their business model. Naively, I would have expected the fraction of your donation that goes to gifts to go down as your gift went up. Indeed, you might have thought that they would take a small loss on the smallest pledges just to get people involved and then move to the upsell at some later date. Thinking about it some more, I guess the natural model is that KQED as trying to extract money from you up to the point where the marginal dollar they extract from you costs them a marginal dollar in gifts (or in this case food bank donations) at which point they stop. So, as people's marginal utility of having given something, anything, to KQED declines, they need to keep jacking up gift quality faster than the size of the donation to keep extracting your cash. Other theories are of course welcome.

 

January 19, 2009

Mrs. G. and I were up in San Francisco last weekend and while on our way to Fog City News we ran into someone we knew. This was sort of surprising, so I got to thinking about how probable it was (or wasn't). Grossly oversimplifying, my reasoning goes something like this:

The population of San Francisco is about 800,000. Let's call it 10^6. I know perhaps 100 people in the city at any given time. There are maybe 20-50 people on any given stretch of city block. Say I walk for an hour at 3 mph and that the average block is 100m long, so I walk for 50 blocks in that time and pass on the order of 10^{3} people. If we assume people are randomly distributed (this is probably pessimistic, since I know that I spend most of my time in SF in a few places and I assume my friends tend to be somewhat similar) then I have a .9999 chance of not knowing any given person I run into. If we assume that these are independent events then I have a .9999^{1000} chance of not knowing any of those people [technical note: this is really (999900/1000000) * (9998999/999999) * ..., but these numbers are large enough and we've made enough other approximations that we can ignore this.] .9999^1000 = .90 so if I walk around the city for an hour, I have about a 1/10 chance of meeting someone I know. That doesn't sound too far out of line.

 

December 29, 2008

One of Slate's odder sections is the "Green Lantern", where they take on some simple question like "should I buy a natural or artificial Christmas Tree" and try to analyze it from an environmental perspective. The most recent article asks whether you should throw away your leftovers or flush them down the garbage disposal. Unfortunately, the articles tend to be pretty useless: sometimes they have a real answer but often they thrash around for a while giving you the pros and cons of each option and conclude that maybe you should do A and maybe you should do B:
The research is unambiguous about one point, though: Under normal circumstances, you should always compost if you can. Otherwise, go ahead and use your garbage disposal if the following conditions are met: First, make sure that your community isn't running low on water. (To check your local status, click here.) Don't put anything that is greasy or fatty in the disposal. And find out whether your local water-treatment plant captures methane to produce energy. If it doesn't--and your local landfill does--you may be better off tossing those mashed potatoes in the trash.

Or maybe not... Here's another example:

If these ideas don't excite you, the Lantern recommends putting the new cash toward insulating your family's home. Of course, whether this makes sense depends on your local climate and whether you buy or rent. (Likewise, the current state of your home will determine just how much insulation your $100 will buy.) For the rest of you, it might be wisest to replace any antiquated, energy-inefficient appliances you might have--along the lines spelled out here. (Let's put aside the complicated question of carbon offsets, which will be addressed in a future column. Suffice to say that they wouldn't be the Lantern's first choice.)

I'm not saying I can do any better; rather I think this is reflective of a systemic problem with this kind of overall cost/benefit analysis. While it's possible to measure the power consumption, carbon emissions, etc. of any particular microactivity, it's pretty hard to do an overall cost/benefit analysis of whether you should do A or B when each of them consists of a whole bunch of individual activities, all of which require their own analyses. The economist type answer is to levy Pigouvian taxes on each individual component (e.g., carbon taxes) and then let the market sort things out. I don't know if that would work any better, though, but I don't see people being able to do this kind of analysis for each individual purchasing decision either.

 

October 18, 2008

Most of my running socks are Wigwam Ironman Ultralites, but I recently tried some of the Injinji Tetrasoks, which seem to be the preferred choice of a lot of the ultra guys. The Tetrasoks are more like gloves for your feet than socks: each toe is in an individual pocket, thus at least theoretically preventing blisters between the toes. Takes a little getting used to, but they're pretty comfortable, actually.

Anyway, this creates a new problem in terms of laundry: ordinarily, any pair of socks the same color make a pair, but tetrasoks are chiral, so if you just pick a random pair of socks you have a 50% chance of having two lefts or two rights, neither of which is very useful. So, with ordinary socks it's likely optimal to have all your socks the same because that minimizes your search cost (no matter what your laundry strategy is). However, the situation with tetrasoks is more complicated. Let's say that you have a pile of laundry and you pull items out one at a time. If a sock matches a sock you've already seen you pair it up. If not, you put it aside waiting for a match. If it takes time X to figure out what color each sock is, and tiem Y to tell whether it's a left or right sock, then if X > Y, then you want to get all socks the same color: you just have a working pile of socks (either left or right). When a sock comes in it's either the other side, in which case you pull a sock off the stack and put the pair away. If it's the same side, you put it on the stack.

On the other hand, if Y > X, it's a little trickier. You need to maintain a separate pile for each color. If you only have one of each color, then you only ever incur X, since the second of each color must be its match. On the other hand, if you have more than one of each color, then you incur Y + X because you need to know both which pile to look in and whether a sock is a match or just more of the same (this can be partially optimized when you've already paired up all but one pair of a given color, at which point you go back to incurring X for that color). So, the breakeven point here depends on how many different colors of socks the manufacturer makes. Injinji only makes three colors of this sock (white, black, tan) so it probably makes the most sense to buy all one color.

Extra Credit: Is the situation the same if you just throw all your socks in the drawer and then try to pick out socks later? Scanning for colors isn't the same as deciding what color something is...

UPDATE: Apparently this is a problem many others are concerned with as well.

 

October 4, 2008

An acquaintance recently brought me a homework problem which asked roughly "True or false: If you had x-ray vision you could read a book without opening it." Phrased like this, I'm reasonably confident the answer is "False", but it's a bit complicated to analyze.

First, we have to ask what x-ray vision is. It's natural to think of it as being like ordinary vision, i.e., your eyes are sensitive to x-rays. This doesn't make any sense, though: ordinary vision works because there is ambient light in the visible spectrum (from the sun, indoor lights, etc.) and it reflects off ordinary objects. What you're seeing is that reflected light. But there's next to no ambient x-ray radiation in the ordinary environment, so even if you could detect x-rays, it would be like trying to see in the dark.

So, in order for some sort of x-ray vision to work, you'd presumably need to be emitting some sort of radiation to illuminate the subject. Assuming that the source is attached to you, then x-ray absorption/transmission measurements (like with medical or dental x-rays) won't work, but x-ray reflection or x-ray fluorescence (XRF) will. The other possibility is that you're emitting some other sort of radition that the subject absorbs and then fluoresces in the x-ray region (really, it could be in any region, but remember we're calling it x-ray vision, not hard radiation fluorescing in the visible vision).

There are actually at least two aspects to this question:

  1. Can you use x-rays to distinguish ink from paper?
  2. Can you resolve the letters on individual pages?

I suspect that the answer to (1) is "Yes". Older inks, at least, often contained metals (e.g., iron gall ink), and so should have fairly different spectra from paper. In fact, this article describes the use of just such a technique: XRF to find iron gall inks in the Archimedes Palimpsest. It's less clear how modern inks would respond. I've seen some articles suggesting that x-ray fluorescence spectra of modern inks also are distinguishable from the background paper spectrum (see this article on cyan ink and this paper that mentions the use of XRF on postmark inks).

This only answers half the question, because books aren't just one page: they're a bunch of pages on top of each other and so even if the ink does fluoresce differently from the paper, you still need to read the book. There are a bunch of possible problems here: first, if you're using XRF possible your fluorescence isn't in the x-ray, it may get blocked by the paper on top of it. But let's assume you're irradiating in the hard x-ray and your fluorescence is in the soft x-ray and makes it out of the paper; we still have to ask whether you can spatially resolve the letters. This partly depends on the characteristics of your x-ray source.

If it's an uncollimated point source like a lightbulb, then you'll irradiate the whole book and each letter will be an independent emissions source, each of which will also be radially symmetrical, plus all the background noise from each sheet of paper. It's not at all clear you can disambiguate these sources. If you have a collimated source, the situation may be a little better, since then you just get a bunch of point sources along the line the beam is taking through the book). That's probably still not enough though, since the fluorescence from multiple radially symmetrical point sources still overlap on your eyes, and you need to distinguish light which is coming from incredibly small angular displacements (the thickness of a piece of paper apart at a distance of 10s of centimeters away). If you were able to take measurements from multiple independent angles like in a CT scanner, then you would have enough information to resolve things, though with an extremely large amount of computation. There are also multiple beam techniques like four-wave mixing, but that also seems biologically unlikely.

Bottom line, then, I think the answer is "False". I.e., it probably is possible to spectroscopically read books without opening them, but I doubt you could do it with any plausible (or even implausible) biological process.

Extra Reading: Man of Steel, Woman of Kleenex

 

September 12, 2008

Writing in the comments section, Nick Weaver expresses the not unreasonable concern that if you randomly select people who are eligible for extensions, people will try to game the system:
If you do the randomization suggestion, you really don't want to do it for a security conference, because then you will get security researchers playing games (eg, 3-4 different titles/abstracts/author orders, withdraw the lowest extending ones)

This really is the way security people think, but I think it is possible to design technical mechanisms to stop people from doing this. One obvious thing you could do is simply forbid any individual from being an author on two papers that apply for extensions. This sounds a bit restrictive but one could certainly argue that if you're asking for extensions on two papers, you don't have time to work on them both and should just focus on one.

It seems like there is a less restrictive approach, however, namely to use cut-and-choose. Nick's attack relies on people submitting extension requests for papers they don't really intend to submit. This implies a countermeasure where you force authors to prove that their requests are legitimate. You can't really do this for the papers where you reject their extension requests (since they could reasonably claim they're not ready to submit, which is why they asked for an extension), but you can for any request where you grant an extension:

Here's one natural approach:

  • Force people to submit all their requests at once before you tell them whether any are granted.
  • Randomly (cryptographically securely random) select whatever subset of the papers you're going to select.
  • Require the authors to actually submit papers for which requests were granted.

(Obviously you'd need to notify authors ahead of time that these were the terms).

Now, obviously, an author who is trying to cheat you can submit bogus papers (or slight variants), but I'm assuming that if you actually force them to submit, the PC can detect them and appropriately punish the authors.

I'm being deliberately vague around the words "require" and "punish" here, because we're now leaving the realm of cryptography and entering the world of politics. One thing you could do is simply refuse to accept any papers unless the authors submitted all the papers by that author. More severely, you could refuse to accept any papers from them in the future (this sanction is sometimes used in other cases of clear misconduct). In any case, it's clear we could create sanctions that authors wouldn't like.

Let's do the math on how well this would work. For mathematical convenience, assume that you are only submitting a single real paper and that if you're caught cheating is that all your submissions for this conference are summarily rejected. Assume further that if the extension isn't granted, you don't submit. Thus, we have the following outcomes:

OutcomePayoff
No extensions granted  0
One extension granted  1
>1 extension granted  0

If the probability of extensions being granted is p and you submit n times, then the probability of each of these events is:

OutcomeP(outcome)
No extensions granted  (1-p)n
One extension granted  p*(1-p)n-1*n
>1 extension granted  1-pn

However, because we've assumed that the payoffs for the first and third cases are zero, and the second case it's 1, the payoff is just the second case, i.e., p*(1-p)n-1*n. To find the best strategy for the submitter, we want to find the n that gives the maximum payoff. Since we can only choose discrete values of n it's easiest to find this value experimentally. Here's the relevant R code to find the maximum:

pay <- function(p,n){
    n* p * ((1-p) ^ (n-1))
}

max.pay <- function(p){
  n <- seq(1,100)
  pays <- sapply(n,pay,p=p)
  n[order(pays,decreasing=TRUE)][1]
}
(note 1: this code does not work properly for very small values of p where the optimal n value is > 100. That can be fixed by making the maximal n value bigger. note 2: we happen to know that there is a single maximum so we could do a loop that stopped as soon as p_{n+1} < p_n, but that was too much effort.)

It turns out that for all p >= .05, the optimal number of submissions is 1, which is the behavior you want to encourage. For p < .5, it makes sense for the submitters to do multiple submissions (with the number increasing as p increases). Accordingly, if you want to give extensions less than 1/2 the time, you need more draconian punishments (e.g., banning from future submissions) to enforce that.

I should note that this sort of suspicious attitude probably isn't the best way to get the result you want. In general, the academic community depends on people's voluntary compliance with norms (we don't really check that people don't make up their data, for instance), and if you simply tell people that this is the norm for this conference (and perhaps make them explicitly affirm their compliance), I expect they will do so. On the other hand, if you make it a game, they're likely to take it as a challenge to cheat (especially in the security community) (cf. the famous A fine is a price.). On the other hand, that's less fun than designing a new mechanism.

For extra credit, consider the following extensions to the model: (1) the punishment is greater than just having all your papers rejected. (2) authors have more than one paper. (3) submitting without an extension is less good than submitting with an extension but still of nonzero value.

January 2010

S M T W T F S
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Powered by Movable Type 4.23-en