An alternate approach is to mine the available data on what people are offering vehicles for and use this to build an analytical model for predicting prices; this lets us figure out what the appropriate asking (which isn't the same as fair; more on this later) price for a new vehicle is and identify outliers in either direction.

Below, you can find the list of the relevant bikes on sale on CL for the past week or so:

Asking | Model | Year | Mileage | |
---|---|---|---|---|

1 | 7650 | 1150GS | 2002 | 25000 |

2 | 7900 | 1150GS | 2001 | 54000 |

3 | 14500 | 1200GSA | 2006 | 3700 |

4 | 8500 | 1200GS | 2005 | 54000 |

5 | 13700 | 1200GS | 2007 | 3658 |

6 | 7400 | 1150GSA | 2004 | 60000 |

7 | 5500 | 1100GS | 1996 | 23000 |

8 | 11500 | 1200GS | 2005 | 12000 |

9 | 7200 | 1150GS | 2002 | 40000 |

10 | 11950 | 1200GS | 2008 | 29000 |

11 | 9600 | 1200GS | 2005 | 39000 |

I used a simple OLS regression model to fit this data, using the model year and mileage for the bike. The result is:

summary(fit2) Call: lm(formula = d2$Asking ~ d2$Year + d2$Mileage) Residuals: Min 1Q Median 3Q Max -1360.040 -353.520 -150.358 2.140 1708.510 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.201e+06 1.889e+05 -6.359 0.000218 *** d2$Year 6.056e+02 9.423e+01 6.426 0.000203 *** d2$Mileage -7.631e-02 1.578e-02 -4.836 0.001294 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 975.1 on 8 degrees of freedom Multiple R-squared: 0.9108, Adjusted R-squared: 0.8885 F-statistic: 40.84 on 2 and 8 DF, p-value: 6.335e-05

Our model predicts that each year the bike is on the road it loses about $600 in value and that it loses about $76 for each 1000 miles it has. [Note that I'm treating mileage and age as independent variables; it might make more sense to try to estimate "excess" mileage over some base value, but I don't have the baseline data I would need.] In any case, we're doing pretty well here: with only two predictors we are accounting for around 90% of the price variation. We can see this visually by plotting the price points against the best fit plane, as below:

s3d <- scatterplot3d(d2$Asking~d2$Year+d2$Mileage,xlab="Year",ylab="Mileage",zlab="Asking") orig <- s3d$xyz.convert(d2$Year,d2$Mileage,d2$Asking) plane <- s3d$xyz.convert(d2$Year,d2$Mileage,fitted(fit)) i.negpos <- 1 + (resid(fit)>0) segments(orig$x,orig$y, plane$x,plane$y, col=c("blue","red")[i.negpos],lty=(2:1)[i.negpos]) s3d$plane3d(fit)(code ripped off from here).

Points above the plane (shown with red lines) are likely too expensive and points below (with blue lines) are worth checking out to see if they're good deals.

Obviously, we're excluding a lot of variables here. We haven't captured the condition of the bike, how desperate/motivated the seller is to get rid of it, what accessories it has, etc. Looking more closely at the data, the two most comparatively expensive bikes seem to come with a few more accessories, so this may have led the owners to think they could extract more money (I don't think this is really true, however, since often those items are valuable only to the original owner). For the purposes of selecting good deals, we would also like to know how flexible the seller's price is. It's possible that someone lowballing the price will also be less flexible because they've already built that discount into their price. On the other hand, they could be more motivated, so that could cut in the other direction. It would be interested to get secondary data on how much these bikes actually sell for [you could get some of that information by seeing if repeated postings have lower prices], but while that data is available for houses I don't think it is for bikes.