# Predicting motorcycle prices

| Comments (5) | Gear Overthinking
I'm in the market for a new motorcycle and have been looking at the BMW R1150GS/R1200GS. Like cars, motorcycles have a lot of depreciation the minute they pull off the lot, and because you're fairly likely to drop your bike anyway, most people I know figure you might as well buy pre-dropped and look for a used model. But once you're buying used you have the problem of figuring out how much you should pay. KBB motorcycles isn't much help here because the market is small and the mileage varies a lot.

An alternate approach is to mine the available data on what people are offering vehicles for and use this to build an analytical model for predicting prices; this lets us figure out what the appropriate asking (which isn't the same as fair; more on this later) price for a new vehicle is and identify outliers in either direction.

Below, you can find the list of the relevant bikes on sale on CL for the past week or so:

1 7650 1150GS 2002 25000
2 7900 1150GS 2001 54000
3 14500 1200GSA 2006 3700
4 8500 1200GS 2005 54000
5 13700 1200GS 2007 3658
6 7400 1150GSA 2004 60000
7 5500 1100GS 1996 23000
8 11500 1200GS 2005 12000
9 7200 1150GS 2002 40000
10 11950 1200GS 2008 29000
11 9600 1200GS 2005 39000

I used a simple OLS regression model to fit this data, using the model year and mileage for the bike. The result is:

```summary(fit2)

Call:
lm(formula = d2\$Asking ~ d2\$Year + d2\$Mileage)

Residuals:
Min        1Q    Median        3Q       Max
-1360.040  -353.520  -150.358     2.140  1708.510

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.201e+06  1.889e+05  -6.359 0.000218 ***
d2\$Year      6.056e+02  9.423e+01   6.426 0.000203 ***
d2\$Mileage  -7.631e-02  1.578e-02  -4.836 0.001294 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 975.1 on 8 degrees of freedom
Multiple R-squared: 0.9108,	Adjusted R-squared: 0.8885
F-statistic: 40.84 on 2 and 8 DF,  p-value: 6.335e-05
```

Our model predicts that each year the bike is on the road it loses about \$600 in value and that it loses about \$76 for each 1000 miles it has. [Note that I'm treating mileage and age as independent variables; it might make more sense to try to estimate "excess" mileage over some base value, but I don't have the baseline data I would need.] In any case, we're doing pretty well here: with only two predictors we are accounting for around 90% of the price variation. We can see this visually by plotting the price points against the best fit plane, as below:

```s3d <- scatterplot3d(d2\$Asking~d2\$Year+d2\$Mileage,xlab="Year",ylab="Mileage",zlab="Asking")
plane <- s3d\$xyz.convert(d2\$Year,d2\$Mileage,fitted(fit))
i.negpos <- 1 + (resid(fit)>0)
segments(orig\$x,orig\$y, plane\$x,plane\$y, col=c("blue","red")[i.negpos],lty=(2:1)[i.negpos])
s3d\$plane3d(fit)
```
(code ripped off from here).

Points above the plane (shown with red lines) are likely too expensive and points below (with blue lines) are worth checking out to see if they're good deals.

Obviously, we're excluding a lot of variables here. We haven't captured the condition of the bike, how desperate/motivated the seller is to get rid of it, what accessories it has, etc. Looking more closely at the data, the two most comparatively expensive bikes seem to come with a few more accessories, so this may have led the owners to think they could extract more money (I don't think this is really true, however, since often those items are valuable only to the original owner). For the purposes of selecting good deals, we would also like to know how flexible the seller's price is. It's possible that someone lowballing the price will also be less flexible because they've already built that discount into their price. On the other hand, they could be more motivated, so that could cut in the other direction. It would be interested to get secondary data on how much these bikes actually sell for [you could get some of that information by seeing if repeated postings have lower prices], but while that data is available for houses I don't think it is for bikes.

nice work! So I was thinking you could use ebay's "completed auctions" search to find data on how much they actually sold for... given that it's ebay you would necessarily always have a firm asking price, but you could get some selling prices.

Did you try running single variable regressions on year and mileage? Just looking at the data, you may have heteroscedasticity and multicollinearity issues.

Yeah, the results are pretty similar (-.98 for mileage, 702 for age). Also, the correlation
between age and mileage is -.21

Nicely done. Now, if you really want to go overboard, you could run a heteroscedasticity diagnostic. A priori, I would assume that that the age and mileage distributions are rather differently shaped distributions, so heteroscedasticity is a concern. The Glejser test and Spearman's rank correlation test were the two general tests back in the day. I don't know if things have advanced at all. R may have an easy built in function for them.

I am somewhat surprised that the drop is so great. Motorcycles are rather easier to fix than cars. Not least because you can bring them into the house and dismantle them in the living room.

I am sure Mrs Guesswork would not mind at all.