An alternate approach is to mine the available data on what people are offering vehicles for and use this to build an analytical model for predicting prices; this lets us figure out what the appropriate asking (which isn't the same as fair; more on this later) price for a new vehicle is and identify outliers in either direction.

Below, you can find the list of the relevant bikes on sale on CL for the past week or so:

Asking | Model | Year | Mileage | |
---|---|---|---|---|

1 | 7650 | 1150GS | 2002 | 25000 |

2 | 7900 | 1150GS | 2001 | 54000 |

3 | 14500 | 1200GSA | 2006 | 3700 |

4 | 8500 | 1200GS | 2005 | 54000 |

5 | 13700 | 1200GS | 2007 | 3658 |

6 | 7400 | 1150GSA | 2004 | 60000 |

7 | 5500 | 1100GS | 1996 | 23000 |

8 | 11500 | 1200GS | 2005 | 12000 |

9 | 7200 | 1150GS | 2002 | 40000 |

10 | 11950 | 1200GS | 2008 | 29000 |

11 | 9600 | 1200GS | 2005 | 39000 |

I used a simple OLS regression model to fit this data, using the model year and mileage for the bike. The result is:

summary(fit2) Call: lm(formula = d2$Asking ~ d2$Year + d2$Mileage) Residuals: Min 1Q Median 3Q Max -1360.040 -353.520 -150.358 2.140 1708.510 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.201e+06 1.889e+05 -6.359 0.000218 *** d2$Year 6.056e+02 9.423e+01 6.426 0.000203 *** d2$Mileage -7.631e-02 1.578e-02 -4.836 0.001294 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 975.1 on 8 degrees of freedom Multiple R-squared: 0.9108, Adjusted R-squared: 0.8885 F-statistic: 40.84 on 2 and 8 DF, p-value: 6.335e-05

Our model predicts that each year the bike is on the road it loses about $600 in value and that it loses about $76 for each 1000 miles it has. [Note that I'm treating mileage and age as independent variables; it might make more sense to try to estimate "excess" mileage over some base value, but I don't have the baseline data I would need.] In any case, we're doing pretty well here: with only two predictors we are accounting for around 90% of the price variation. We can see this visually by plotting the price points against the best fit plane, as below:

s3d <- scatterplot3d(d2$Asking~d2$Year+d2$Mileage,xlab="Year",ylab="Mileage",zlab="Asking") orig <- s3d$xyz.convert(d2$Year,d2$Mileage,d2$Asking) plane <- s3d$xyz.convert(d2$Year,d2$Mileage,fitted(fit)) i.negpos <- 1 + (resid(fit)>0) segments(orig$x,orig$y, plane$x,plane$y, col=c("blue","red")[i.negpos],lty=(2:1)[i.negpos]) s3d$plane3d(fit)(code ripped off from here).

Points above the plane (shown with red lines) are likely too expensive and points below (with blue lines) are worth checking out to see if they're good deals.

Obviously, we're excluding a lot of variables here. We haven't captured the condition of the bike, how desperate/motivated the seller is to get rid of it, what accessories it has, etc. Looking more closely at the data, the two most comparatively expensive bikes seem to come with a few more accessories, so this may have led the owners to think they could extract more money (I don't think this is really true, however, since often those items are valuable only to the original owner). For the purposes of selecting good deals, we would also like to know how flexible the seller's price is. It's possible that someone lowballing the price will also be less flexible because they've already built that discount into their price. On the other hand, they could be more motivated, so that could cut in the other direction. It would be interested to get secondary data on how much these bikes actually sell for [you could get some of that information by seeing if repeated postings have lower prices], but while that data is available for houses I don't think it is for bikes.

nice work! So I was thinking you could use ebay's "completed auctions" search to find data on how much they actually sold for... given that it's ebay you would necessarily always have a firm asking price, but you could get some selling prices.

Did you try running single variable regressions on year and mileage? Just looking at the data, you may have heteroscedasticity and multicollinearity issues.

Yeah, the results are pretty similar (-.98 for mileage, 702 for age). Also, the correlation

between age and mileage is -.21

Nicely done. Now, if you really want to go overboard, you could run a heteroscedasticity diagnostic. A priori, I would assume that that the age and mileage distributions are rather differently shaped distributions, so heteroscedasticity is a concern. The Glejser test and Spearman's rank correlation test were the two general tests back in the day. I don't know if things have advanced at all. R may have an easy built in function for them.

I am somewhat surprised that the drop is so great. Motorcycles are rather easier to fix than cars. Not least because you can bring them into the house and dismantle them in the living room.

I am sure Mrs Guesswork would not mind at all.