Executive summary

In this report I describe how I generated a set of notional results for the 2021 Scottish Parliament election.

These notional results give the count of constituency and list votes that would have been won by the six largest parties had the election been fought on the boundaries proposed by Boundaries Scotland.

These notional results were produced in part by multilevel regression and post-stratification (MRP). MRP is a technique which uses large national samples to generate estimates of opinion at the local level. It does this by combining sample information with information from the census on how many people of different demographic types live in different local areas.

The estimates have been scaled so that the notional results match exactly the 2021 results if the constituencies are unchanged. The estimates also preserve the total count of votes across constituencies. To give a simple example: if there are two constituencies A and B which exchange parcels of land, the sum of each party’s votes in A and B under the notional results is the same as the sum of those votes in the actual 2021 results.

The main conclusion is that under the proposed boundaries the Green party would have won one more seat, and the Labour party one fewer seat, than they did in fact win in 2021.

This report is divided into six parts.

I begin by describing the building blocks of the exercise: census data for output areas and the Scottish Election Study.
Second, I describe the post-stratification frame, which stores how many people of different types live in different areas.
Third, I use the post-stratification frame to show the index of change between old and new constituencies.
Fourth, I describe the model I estimate, and show how some of the demographic characteristics are associated with the probability of voting for different parties (or not at all).
Fifth, I describe the notional results and the aggregate pattern of seats.
Finally, I compare these notional results to alternative notional results from Ballot Box Scotland.

Building blocks

Output areas

The modelling I do is based on output areas.

Output areas are the smallest areas for which Scotland’s Census produces statistics.

There are over 46,000 output areas, with an average population of around 120 people.

Output areas don’t have names, only codes.

An example output area is S00145472, which is plotted in Figure 1.

Figure 1: Plot showing the boundaries of output area S00145472 (dashed red line) and the population weighted centroid (red star)

This output area is a residential area in Gilmerton, in the south of Edinburgh. The houses there are recently built. At the time of the 2022 Census, the population was 153 individuals, of which 122 were of voting age.

Output areas are characterised by two things: their boundaries, and their population weighted centroid.

The boundary of an output area is not hard to understand. It shows where and how far the area extends. There are complications to do with shorelines, and how detailed these boundary lines can be, but they’re not relevant here.

The centroid of any flat shape is the point upon which the shape would balance perfectly, if the weight of the shape were the same at all points.

The population weighted centroid is the point at which the shape would balance perfectly if we located each person on the shape and give them equal weight.

The population weighted centroid for S00145472 is shown in the plot with a star. This is the best information we have about where most people in that output area are. There is no lower level source of official information about where people in this area live.

For this reason, I’ll be mapping output areas to existing and proposed parliamentary constituencies on the basis of their population weighted centroids. Some output areas might technically intersect constituency boundaries, but

any intersections might be the result of coarsely drawn boundaries;
the proportion of people living in those intersections is likely to be negligible; and
there is no better official source which could resolve this issue.

Scottish Election Study

The Scottish Election Study (SES) is an academic study of voting behaviour in Scottish elections. It has different components: an election study with a panel element, surveying people before and after elections, and a periodic opinion monitor, which samples a cross-section of people at irregular points through the year.

The 2021 election study was fielded in two waves. The pre-election wave was fielded between 8th April and 5th May; the post-election wave, between 13th and 28th May.

Because I am modelling how people voted, I focus on the 3,355 individuals who responded to the second wave, and in particularly the 3,254 individuals who reported either a list vote or a constituency vote, and for whom we know their Scottish Parliamentary constituency.

There are lots of variables in the SES which could potentially be used to model individuals’ voting behaviour. In this report, I model voting behaviour using the following individual characteristics:

age
sex
whether or not they have a degree;
whether or not they belong to an ethnic minority;
whether the house in which they live is owned or rented;
whether they are religious
whether their national identity is Scottish above all other national identities

I chose these individual characteristics because:

they feature in the SES and in the Census in roughly similar form;
they feature in the 2011 census microdata which I use (see below);
they are predictive of voting behaviour;
they describe broad categories of people

There are variables which I could have included which would ordinarily be useful in modelling voting behaviour but which don’t meet some of the criteria above.

One variable which is not present in the SES and the Census in the same form is social grade. The SES contains information on approximated social grade (whether the respondent belongs to grades AB, C1, C2 or DE). Whilst the 2011 census included information on approximated social grade, no such calculation has been performed for the 2022 census. The census does include information on standard occupational codings, which can be used to construct a class measurement, but I’ve not done so. In any case, approximated social grade is built upon other variables I do use in the analysis, such as housing tenure and educational qualifications.

One variable which is present in the SES and in the Census, but which is not present in the 2011 census microdata is sexual orientation. Sexual orientation is an important predictor of vote choice, but there was no question on sexual orientation in the 2011 census, and so it’s difficult to know how reports of sexual orientation are associated with other demographic variables.

Finally, there are some variables – or versions of variables – which are potentially predictive, but where the numbers of individuals involved are too small to make reliable inferences. Whilst it would be possible to investigate whether individuals from different ethnic minority backgrounds vote differently, only a small proportion of the Scottish population comes from an ethnic minority background, and the numbers involved are smaller when we disaggregate by specific ethnic minority group. I’ve therefore collapsed some variables like ethnicity and religion into dichotomies.

Post-stratification frame

The reason I described output areas is because information on these output areas allows me to create a post-stratification frame.

You can think of a post-stratification frame as a spreadsheet with a row for each combination of person characteristics, which records how many people with that set of characteristics live in each area.

For example: a post-stratification frame might record how many (i) men are (ii) aged between 60 and 64, are (iii) from an ethnic minority, (iv) have a university degree, are (v) religious, (vi) don’t identify primarily as Scottish, and (vii) live in rented accommodation.

Obviously, the smaller the areas and the more numerous and variegated the characteristics, the smaller these counts will be.

Constructing a post-stratification frame is an involved process. Although the Census tells us how many people in a given area are religious, and how many people in a given area live in rented accommodation, it doesn’t tell us how many people are religious renters. To use terms from statistics: the Census (generally) tells us about marginal distributions, but not the joint distribution.

To create a joint distribution, I use microdata from the 2011 census. This gives information on a five percent sample of 2011 census returns, or just over a quarter of a million individuals. With such a large volume of individual returns, we can work out the association between being religious and being a renter.

I use this data to create initial weights for each combination of characteristics. I then repeatedly multiply or divide these weights so that the sum of weights for each characteristic, taken one at a time, matches the marginal distribution reported in the Census. For example: I might have four proportions:

religious renters;
religious owners;
irreligious renters;
irreligious owners;

and I might multiply or divide the weights in first two categories to match the Census reported proportion of religious people in the relevant area. I might also multiply or divide the weights in the first and third categories to match the Census reported proportion of renters in the relevant area. Through repeated processes of multiplication and division – in statistical terms, “iterative proportional fitting” or “raking” – I can start with an arbitrary joint distribution and get something that matches what I know about each particular area.

Additionally, where the Census does report associations – as they do when they release tables like “Religion by sex by age” or “Ethnic group by age”, I can ensure that the combination of those characteristics is matched almost exactly at the local level.

In describing post-stratification frames, I have spoken about areas generically. For this report, I’ve constructed a post-stratification frame defined on intersections between old and new constituencies. There are 114 such intersections. This means that the post-stratification frame is defined by:

area [114 intersections of old and new constituencies]
age [14 distinct categories]
sex [2 distinct categories]
membership of an ethnic minority [2 distinct categories]
university degree [2 distinct categories]
religiosity [2 distinct categories]
national identity [2 distinct categories]
housing tenure [2 distinct categories]

This means that the post-stratification frame has just over 80,000 non-zero rows in it. After removing combinations with absolutely no people, the average (median) count in each cell / unique combination of attributes is fourteen.

This post-stratification frame covers the voting age population rather than the voting eligible population. There are individuals in the voting age population who are not eligible to vote because they are imprisoned for sentences longer than twelve months or because they are not a citizen of a qualifying country. The difference between the voting age population and the voting eligible population is much smaller in Scotland than in England because of less restrictive rules on the franchise.

In my opinion, this is the most granular post-stratification frame that can reasonably be constructed for this purpose. Whilst it would be desirable to model voting behaviour at a lower level, the SES only records information on respondent’s 2022 parliamentary constituency. Although it records information on postcode sector, postcode sectors can correspond to multiple output areas and can span (current and proposed) constituencies.

The degree of change between old and new constituencies

Because we need to count how many people live in the intersections between old and new constituencies, we can calculate how much each new constituency has changed relative to its closest predecessor.

I define an index of change as one minus the proportion of the new constituency’s population which lived in the old constituency with the greatest population overlap, divided by the new constituency’s population.

I illustrate this with Edinburgh Southern. Figure 2 shows the proposed boundary for Edinburgh Southern and the boundaries of its four predecessor constituencies.

Figure 2: Plot showing the proposed boundary for Edinburgh Southern and boundaries of predecessor constituencies.

The figure shows thart the new Edinburgh Southern has shifted southwards, taking in the eastmost part of Edinburgh Pentlands, and the southern fringe of Edinburgh Eastern, which had previously snaked round to include the Gilmeron area. The new Edinburgh Southern includes a tiny portion of Edinburgh central, although this may be due to inaccuracies in the map data.

Table 1: Proportion in the new Edinburgh Southern coming from four old constituencies

Old constituency	Population from this constituency	Proportion
Edinburgh Central	138	0.2
Edinburgh Eastern	24657	37.9
Edinburgh Pentlands	7765	11.9
Edinburgh Southern	32485	49.9

Table 1 shows the count of individuals coming from each of the four predecessor constituencies. The largest single contributor is the old Edinburgh Southern constituency, which contributes almost half of the new constituency population. The index of change is therefore \(1 - \frac{49.9}{100} = .501\).

Table 2: Index of change in the ten constituencies with the largest change

New constituency	New constituency code	Index of change
Glasgow Central	47	58.74355
Edinburgh Eastern, Musselburgh and Tranent	33	55.74137
Edinburgh Southern	38	50.05765
Edinburgh Northern	36	46.75669
Glasgow Kelvin and Maryhill	49	44.89727
Glasgow Cathcart and Pollok	46	41.12151
Glasgow Easterhouse and Springburn	48	40.07663
Edinburgh North Eastern and Leith	34	35.39741
Glasgow Baillieston and Shettleston	45	35.34289
Renfrewshire North and Cardonald	66	35.25863

We can calculate this for all new constituencies. Table 2 shows the value of this index for the ten constituencies which have seen the greatest change by this metric. Although it is natural to focus on areas which have seen the most change, 46 constituencies have seen negligible change (index values less than one percent).

Modelling

I estimate two models: one model of list vote, and one model of constituency vote.

In the model of list votes, the outcome variable is a categorical variable with eight possible values (in alphabetical order: Alba, Conservative, Did not vote, Green, Labour, Liberal Democrat, SNP, Other).

In the model of constituency vote, the outcome variable is the same, except that there are seven possible values because Alba did not stand constituency candidates.

I model these options because these were the only parties that won more than one percent of the vote, and because these are the only parties chosen by more than a dozen respondents in the Scottish Election Study.

I model these outcomes using different predictor variables. I include all of the demographic variables which feature in the post-stratification frame. These variables are modelled as fixed effects, except that age is modelled as though it were a continuous variable using a spline. I also include constituency random intercepts and constituency level predictors. The constituency level predictors for the model of list votes includes the shares of the voting age population who voted for the Conservative, Labour, Liberal Democrats and Greens. The constituency level predictors for the model of constituency vote also includes dummy variables which record whether a Green candidate or any other candidates stood for election in that constituency.

The statistical model I use is a multinomial logit model estimated using Bayesian methods using the brms package.

Although the primary purpose of the model is to generate predictions, it is helpful to understand the predictions implied by the model when we change respondent characteristics. For example: we might “give” a respondent a degree and ask how that changes the predicted probability of voting for the SNP.

Figure 3: Plot showing changes in the probability of voting for different parties with respondent’s list vote. Changes in probabilities need not sum to zero because abstention is omitted.

Figure 3 shows the changes in the predicted probabilities for discrete changes in respondent characteristics. People with degrees are much more likely to vote for the Greens. Ethnic minorities are less likely to vote for Labour or the Conservatives. Scottish identifiers are much more likely to vote for the SNP. Men are more likely to vote for the Conservatives, and less likely to vote for the SNP. Respondents who are religious are more likely to vote for the Conservatives, but renters are much less so.

Post-stratification of model predictions

With the two models estimated in the previous section, we can make predictions for each intersection. We can add these predictions up for each of the 115 intersections. We can adjust these predictions so that when we aggregate from intersections to old constituencies, the results match exactly. Then we reaggregate from intersections to new constituencies to get notional results.

I adjust these predictions in two ways. First, I add on shifts on the logit scale. This reduces, but does not eliminate the difference between predicted and actual results. I then re-weight by multiplying so that the predicted and actual results match almost exactly given rounding error.

Adjusting the results for Reform

The above procedure gives estimates for seven parties, but not Reform. In the 2021 election, Reform won a fifth of one percent in the list vote. In the 2026 election, Reform is likely to do much better. I therefore proportionally reallocate reform votes between new constituencies. For each intersection, I divide the “Other” vote between “Reform” and “all others” on the basis of the Reform share of the “Other” vote in the parent 2021 constituency. I subtract this from the “Other” tally. I then reaggregate to the 2026 constituency level. This assumes that the ratio between Reform and all other parties is preserved across all parts of “old” constituencies. Reform can be more popular in some parts of a constituency, but only if all other parties are also more popular. This implicit assumption is neither intuitive nor particularly plausible, but there is no better way of allocating Reform votes between constituencies.

Results

The output of this procedure is a set of notional results for (i) the constituency vote and (ii) the list vote for the seven largest parties. Table 3 shows the result for Renfrewshire North and Cardonald.

Table 3: Notional results for Renfrewshire North and Cardonald

Contest	Alba	Cons	Greens	Lab	LDem	Other	Reform	SNP	DNV
List	768	8419	2582	9201	858	1273	54	17159	26647
Constituency	NA	7108	630	11230	878	456	NA	19794	26860

We see a slight discrepancy in the numbers who did not vote, implying that some individuals only cast a valid vote in the constituency contest. That’s possible, and the discrepancy is just within the range of invalid votes.

Table 4: Notional results for each party

Party	Constituency	(chg.)	Regional	(chg.)	Total	(chg.)
SNP	63	(+1)	1	(-1)	64	(+0)
Cons	5	(+0)	26	(+0)	31	(+0)
Lab	1	(-1)	20	(+0)	21	(-1)
Greens	0	(+0)	9	(+1)	9	(+1)
LDem	4	(+0)	0	(+0)	4	(+0)
Alba	0	(+0)	0	(+0)	0	(+0)
Other	0	(+0)	0	(+0)	0	(+0)
Reform	0	(+0)	0	(+0)	0	(+0)

The main point of comparison, however, must be the overall seat tallies (i) on the actual results, and (ii) with these notional results. These are shown in Table 4. Labour “lose” one seat, failing to win the redrawn Edinburgh Southern. The Greens win a list seat in the South of Scotland. Further detail is given in Table 5, which breaks down seat allocations by region.

Table 5: Notional seat entitlements for each party by region. Figure before the plus mark gives constituency seats won; figure after the plus mark gives list seats won.

Region	Cons	Greens	LDem	Lab	SNP
Central and Lothians West	0 + 3	0 + 1	0 + 0	0 + 3	9 + 0
Edinburgh and Lothians East	0 + 3	0 + 2	1 + 0	0 + 2	8 + 0
Glasgow	0 + 2	0 + 1	0 + 0	0 + 4	8 + 0
Highlands and Islands	0 + 4	0 + 1	2 + 0	0 + 1	6 + 1
Mid Scotland and Fife	0 + 4	0 + 1	1 + 0	0 + 2	8 + 0
North East Scotland	1 + 4	0 + 1	0 + 0	0 + 2	9 + 0
South Scotland	3 + 3	0 + 1	0 + 0	0 + 3	7 + 0
West Scotland	1 + 3	0 + 1	0 + 0	1 + 3	8 + 0

Comparison with alternative estimates

I have based these notional estimates on a model of voter behaviour estimated on individual responses to a survey taken shortly after the 2021 Scottish Parliament election. This is not the only way to create notional constituency results, and indeed it has not been the dominant way. Most notional results have been put together using information from local elections, since results from local elections are compiled at a more granular level than are results at parliament level.

For the new Scottish Parliament boundaries, Ballot Box Scotland has produced notional results which are based on polling district level results collected directly from local councils.

Creating notional results based on local election results makes a great deal of sense. Local elections often show similar patterns of party support to national elections. Working with local election data also has the advantage that analysts are working with counts of actual behaviour rather than counts of reported behaviour. In the specific case of local election results collated at the polling district, this data is very granular.

Having said that, I think there are several reasons why working from individual level data is preferable.

First, local elections are held under a different electoral system. It seems plausible that the distribution of party support might be similar when comparing the distribution of first preferences under the single transferable vote and list votes. However, this is a big and unverifiable assumption. It’s also much less plausible that the distribution of party support is similar when comparing the distribution of first preferences under the single transferable vote and constituency votes.

Second, many local elections didn’t feature Alba, which means that they’re uninformative about the distribution of Alba support. Modelling Alba support correctly isn’t consequential for Alba’s seat tally, which remains at zero under any plausible redistricting, but is important insofar as Alba may draw votes away from other parties.

Third, local elections do feature relevant independent candidates, which can’t be said of the Scottish Parliament elections. What is worse, these independent candidates do not draw evenly from other parties, but are disproportionately likely to appear and win votes in areas which are more Conservative, as judged by the results of Scottish Parliament elections. This means that in some cases areas will appear as though they were void of Conservative support, when actually that support is masked by strong support for independents.

Fourth, local elections were held a year after the Scottish Parliament election. Whilst geographic patterns of party support are unlikely to have changed within the course of a year, it seems preferable to use 2021 data to work out what would have happened if the 2021 election had been held on 2026 boundaries.

These are reasons why in principle we should prefer notional results based on contemporaneous individual data from the election rather than notional results based on data from separate elections held a year later under a different electoral system. In practice, sets of notional results are extremely similar.

Figure 4: Comparison of notional results between this report (horizontal axis) and Ballot Box Scotland (vertical axis). Values plotted are counts of votes cast; solid lines give the best fitting linear regression line.

Figure 4 plots counts from this report against notional counts from Ballot Box Scotland for the five largest parties. The dashed line shows the results of the best fitting linear regression line, the equation for which is reported at the top of each panel. There is, to two significant figures, a one to one correspondence between the two sets of notionals. The high association is artificial in the sense that the association is based on all constituencies, including constituencies with minimal change. However, when we remove these constituencies the correlation across parties’ list votes ranges between 0.939 (Greens) and 0.971 (Conservatives), and the mean absolute difference in list votes ranges between 261 votes (Liberal Democrats) and 542 (Conservatives).

High associations can be consistent with differences in seat tallies if a small number of seats are decided by a handful of votes, but there is only one constituency where the two sets of notional results disagree regarding the constituency winner. On the basis of my notional results, the SNP would have won Edinburgh Southern rather than Labour. This is because, as shown in Figure 2 and Table 1, the new Edinburgh Southern seat draws upon Edinburgh Eastern. In Edinburgh Eastern a majority of voters (52%) cast their constituency vote for the SNP’s candidate, Ash Regan. Whilst both these notional estimates and the nationals from Ballot Box Scotland show that the notional constituency race in Edinburgh Southern would have been closer as a result of the influx of voters from Edinburgh Eastern, in my model this influx is sufficient to tip the seat.

Although the constituency winners are almost everywhere the same, there is one further difference in the list seats. Ballot Box Scotland have the Greens winning two list seats in Glasgow, whereas I have them at just one list seat. This is not driven by differences in the estimates of the Green vote: my estimates have the Greens winning 94 more votes than in the Ballot Box Scotland estimates. Rather, the difference in seats is caused by the votes won by Labour. In my notional results, Labour wins more votes, and crucially wins enough votes that its vote share is twice that of the Conservatives or the Greens. This means that when seats are allocated proportionally, Labour gets two bites at the cake for every one bite taken by either the Conservatives or Labour. Although systems of proportional representation reduce the incidence of sharp discontinuities, they do not remove them – and if Labour, across all of Glasgow, had won 368 fewer votes the Greens would have won an extra seat.

Conclusions

The full set of notional results is set out in two accompanying CSV files, which contain the full counts for constituencies as recorded by their name and number in the original Boundary Commission shapefiles.

The constituency results are here

The list results are here

The code used to generate these estimates will be made publicly available on Github.