Before this week’s Scottish Parliament I said I’d produce a forecast post-mortem, no matter the result.
The tl;dr version of the post-mortem? Uh, not good.
The most obvious point at which to start that post-mortem is the seat forecast. The table below shows the final forecast seat tally from Thursday morning. The total seat error was 17 seats, more than half of which came from under-estimating the Conservative seat tally.
|Liberal Democrat||5||(2, 8)||5|
The final seat tallies were within the 90% forecast interval for all parties save the Scottish Conservatives. The probability of the SNP winning fewer than 65 seats had increased in the final forecast on Thursday morning to eight percent. This was substantially higher than the figure used in a Tuesday morning press release (just 1%). It was also something which I overlooked on Thursday morning.
Error in predicting the list vote
The seat forecasts above depend on the vote forecasts. Under the additional member system modelling the list vote matters more for seat forecasts than modelling the constituency vote. Table 2 therefore shows the forecast list vote for each party, alongside the polling trend on the last day,1 and the actual results.
The forecast list vote share is based on the tendency of polls to exaggerate change since the last election. The general election of last year provided a clamorous example of this tendency: the polls drastically exaggerated the improvement in Labour’s vote share.
|(39.5, 45.2)||(39.2, 46.2)|
|(16, 22.3)||(17.3, 23.7)|
|(16.7, 23.1)||(14.9, 21.3)|
|(4.8, 7.6)||(4.2, 7.7)|
|(6.7, 10.4)||(5.6, 9.7)|
|(2.3, 4.1)||(1.5, 3.9)|
|(0.9, 1.9)||(1.6, 3.7)|
Unfortunately, the effect of applying this forecast model was to discount the improvement in the vote share of the Scottish Conservatives, and the decline of Scottish Labour. Although the forecast was more accurate than the final poll trend in predicting the vote share of the Liberal Democrats, Greens, and UKIP, this is limited consolation. The mean absolute error of the predictions for the top seven parties was larger than the mean absolute error of predictions based purely on the polling trend (1.36 versus 1.29).
The forecast model didn’t just affect the mean prediction: it also affected the width of the confidence interval. The 90% forecast intervals are all wider than the 90% intervals which result from the poll-pooling model. Unfortunately, these 90% forecast intervals failed to encompass the vote share of the Scottish Conservatives.
Error in predicting the distribution of the constituency vote
The forecast model has two principal parts. One part deals with forecasting national vote share — which, as I’ve just shown, didn’t exactly work as planned.
A second part deals with forecasting vote share in individual constituencies. Although I dislike the emphasis on constituency contests, they do provide a useful test of the model’s ability to get the distribution of parties’ vote share right.
A useful benchmark is provided by uniform national swing. I’ve therefore taken the parties’ 2011 performance, and added on the change in their share of the constituency vote between 2011 and 2016.
In doing this, I’m giving UNS sight of the 2016 result. In order to make it a fair contest, I’ll therefore work out on average how much my forecast was short of each party’s share of the constituency vote, and add this on to each constituency forecast.
When I do this, I find that the mean absolute error for UNS was 4.32, whereas the mean absolute error for my forecast model was 4.79. In other words, the extra effort that went in to modelling constituency variation didn’t produce a more accurate picture of what was happening in each constituency than a simple exercise in addition and subtraction.
Not all of the errors described above were within my power to change. The two mistakes which stand out most clearly concern the over-confidence of the model, and the distribution of parties’ vote shares.
I think the model was over-confident because the reversion-to-last-election model is too simple. It has some nice properties: you can, for example, apply it to all parties symmetrically, so that vote-shares stay within 100 percent. It remains, however, a gussied-up regression model, and is prone to all the problems of over-fitting that that presents.
I’m not sure whether leave-one-out cross-validation would generate much wider forecast intervals, but it’s definitely something I’ll be looking at.
I think the model did a poor job at predicting the geographic distribution of parties’ votes because I just didn’t have the time to add all of the necessary constituency-level predictors. In the 2015 forecasting model, we were able to do a road-test on 2010 BES data, identify useful predictors at constituency level, and use those predictors in a complex model drawing on a relatively large number of constituency-level observations. That wasn’t possible here. I didn’t have any data from 2011, and so I opted to start with a few obvious predictors (GE2015 and SP2011 vote share) and expand as I went. In the end, the lengthy process of variable testing that I had envisaged just didn’t happen — and with limited media buy-in, I wasn’t inclined to spend much more time on an election that seemed to interest very few.
To use a guid Scots expression, I’m fairly scunnered with election forecasting. It’s doubly fortunate that I’d already sworn off election forecasting (though I might be doing some on-the-night analysis), and that the next major set of elections is only in 2020. Until then, I’ll stick to analysing events that have already happened…
- This trend results from a model which pools polls. This model is based on work by Simon Jackman. See his article “Pooling the polls over an election campaign” for an introduction to the model. The pooling model used here differs from Jackman’s in that I used an additive log-ratio transform to ensure that the sum of parties’ vote shares remained 100%; and that I dispensed with house effects. The additive log-ratio transform is necessary to keep the estimates logically possible; the house effects were ditched because (particularly at the start of the year) there were relatively few polls and large gaps between polls, which made it very difficult to estimate house effects.↩