UNS will be more broken in 2015 than it usually is

April 6, 2014

Uniform national swing is one of those psephographical tricks that shouldn’t work, but does. There’s not much reason to believe that the local swing towards a party will reflect the national swing towards that party.

Yet UNS has a tremendous record. Suppose that we take two adjacent elections, and `de-mean’ each constituency result — that is, subtract the national average.
If uniform national swing is right, then these de-meaned constituency results should be very closely related. (Contrariwise, if the demeaned constituency results were very different, that would mean that lots of local swings had interfered with the pattern seen in the previous election).

That’s exactly what we find. Taking Danny Dorling’s historical archive of election results, then the pairwise correlation of demeaned constituency results for twentieth century elections for the Conservative party is astonishingly high — between 0.97 and 0.99. You ordinarily don’t get correlations of that magnitude in social science.

But in 2015, UNS is going to face severe problems because of the collapse of the Liberal Democrats. In 2010, the Lib Dems polled 23% of the vote, six percentage points behind Labour. Currently, the UK Polling Report average has them on 10%, implying a swing of thirteen percent away from the party.

The problem is that in seventy one constituencies across the country, the LibDems had a vote share of less than 13%. UNS therefore implies negative vote shares.

You might say that this doesn’t particularly matter, since UNS is usually used for predicting seat shares, and the Lib Dems were never going to win these seats in any event. But it does still matter, because in these seats other parties cannot gain as much from the Liberal Democrats as they do averaged across the country.

Now, the LibDems might recover in the polls — and it’s hard to see them falling much further. If they recover to 14.3%, as Steve Fisher suggests, then UNS will only be `broken’ (in the sense of predicting negative vote shares) in the eleven constituencies where the LibDems polled less than 8.7%. But the 2015 election will nonetheless prove a more difficult test for UNS than some recent elections.

Slides for my talk today at Exeter Q-Step event

March 20, 2014

I don’t know how intelligible these will be if you’re not attending the talk — but here are my slides on forecasting the 2015 General Election.

I don’t include a prediction of my own, because I’m still working on the code to learn about constituency offsets from individual poll responses — in particular, how to set the priors on the offsets so that unusual poll sequences don’t give implausible estimates.

So instead I’ve borrowed from Steve Fisher’s honest-to-god forecast, and UK Polling Report’s UNS-based nowcast.

All of this is really joint work with Nick Vivyan and Ben Lauderdale — except that they don’t bear any responsibility for any hideous mistakes I’ve made.

Pooling the polls in Stan: a bleg

March 4, 2014

I’ve been experimenting with Stan recently. I’ve been working in Stan rather than Jags because I’m doing some work on pooling the polls properly — that is, where levels of support are drawn from a multivariate normal distribution jury-rigged so that draws sum to one. That jury-rigging requires a singular covariance matrix, which can’t be inverted to give the precision matrix that Jags needs. (And no, pseudo-inverses don’t work).

Whilst experimenting with Stan, I’ve found out a number of things about simplexes and just how picky Stan is about its covariance matrices being positive definite — but I had thought I was getting places.

Until today. I thought I should start simple, and try and getting a simple univariate poll-pooler going. I busted out the Jackman BASS code, and wrote up a quick implementation in Jags, omitting the house effects.

Now I find that I can’t replicate that in Stan. I can get a model which samples — but it just gives me back my initial values. If I don’t supply initial values, the model won’t initialize. I’ve eliminated the obvious suspects — confusing precision and variance, for example, or dealing with Stan’s inability to handle inputs that are part-missing.

My code (for both Jags and Stan) is at GitHub. Any ideas?

Update: Ben Lauderdale kindly pointed out that Stan uses the standard deviation for the normal, rather than the variance, so I was off by an order of magnitude. He also took a red pen to the code, meaning that the resulting much cleaner code (which is entirely to Ben’s credit if it works efficiently, and entirely my fault insofar as it doesn’t) is now on GitHub.

How unusual is Yanukovych’s ouster?

February 22, 2014

Assume that Viktor Yanukovych is no longer Ukrainian president. His ouster — if you’ll permit me that unlovely American term — is unusual, since Ukraine seems to enjoy some of the characteristics of a multiparty democracy, and multiparty democracies don’t usually replace leaders through non-constitutionally sanctioned means in response to popular protest – or at least, I’ve usually assumed as such.

I wanted therefore to check whether there are many other examples of countries which approximate democracies which have also had irregular leadership transitions. The Archigos data maintains data on leadership transitions from 1875 to 2004, and includes two codes for “irregular” leader replacement:

  • Leader lost power as a result of domestic popular protest with foreign support
  • . . . without foreign support

We can match this [R code] with Polity data on the democratic character of a regime, where scores of 10 = fully democratic.

Here’s the plot of irregular transitions, with Ukraine added on the assumption that it’s Polity score in 2012 is a reasonable proxy for its score in 2014.

irregular

I’m not an expert on leadership transitions or coups, or on Polity scores — and you should really pay attention to those who are. But hopefully this can identify a class of relevant comparators.

Update: Here’s a version taken the Polity score for the preceding year, in case Polity measures for year x reflect the status averaged over year x rather than status at the beginning of the year.

irregular-lagged

Italian polling update, February 2014

February 20, 2014

This is the first polling update in a couple of months. In the intervening period, my hard drive failed in a bad way. I lost data on a number of projects, including my work on poll-pooling. The smoothed estimates below are no longer based on historical house biases generated from the study of the 2013 election, but are based instead on the assumption that polling companies are on average unbiased.

This update is also the first update to feature the Nuovo Centro-Destra. Since we have no information on their performance for much of the period, we get a random-walk-in-reverse. Apologies if that makes it difficult to interpret the part of the graph that’s actually interesting.

Graphs below. Click on the links to show trend-lines for different parties.

Forza Italia | PD | M5S | Lega | SEL | NCD

Infrequently asked questions

Where do these polling figures come from? Here and here.

What are these trend lines? They’re estimates from a model which treats latent party support as something which evolves smoothly over time, and is made manifest through particular potentially biased polling snapshots.

How are the effects of the different polling companies identified? By assuming that on average, polling companies are unbiased.

Media moguls and their influence

December 13, 2013

I have a new paper accepted for publication with the European Journal of Communication.

A user-friendly write-up is available at my department’s newish blog.

I’ll limit myself to posting the replication materials here.

These are taken from the Sweave file I used to write the article. That article then had to be bowdlerized, since the ECJ only takes submissions in Word format. This notwithstanding, most or all of the content in the final article should also be reproducible using these files.

Drop a comment if there’s something about the code or data that’s not clear.

British headlines: 18% less informative than their American cousins

November 29, 2013

My SO works at a newspaper — a pretty good one, in fact.

She writes many excellent articles — but, like every British journalist, she has a particular penchant for punning headlines.

Unfortunately, many of the fun headlines don’t make it in to the paper — not because they’re particularly scabrous (though some of the funny ones certainly were so — and I still can’t believe that “long march” made it into a story about house prices in Chinatown). Rather, they get jettisoned because they don’t work well if you’re trying to engage in a little bit of search engine optimization (SEO). Google, for all its tremendous achievements in information retrieval, is very bad at understanding puns.

I mention this because I’ve come across a rare instance where the British penchant for punning has complicated my life.

I’m currently working on a project looking at the representation of constituency opinion in Parliament. One of our objectives involves examining the distribution of parliamentary attention — whether MPs from constituencies very concerned by immigration talk more about immigration than MPs from constituencies that are more relaxed about the issue.

To do that, I’ve been relying on the excellent datasets made available from the UK Policy Agendas Project. In particular, I’ve been exploring the possibility of using their hand-coded data to engage in automated coding of parliamentary questions.

One of their data-sets features headlines from the Times. Coincidentally, one of the easier-to-use packages in automated coding of texts (RTextTools) features a data-set with headlines from the New York Times. Both data-sets use similar topic codes, although the UK team has dropped a couple of codes.

How well does automated topic coding work on these two sets of newspaper headlines?

With the New York Times data (3104 headlines over ten years, divided into a 2600 headline training set, and a 400 headline test set), automated topic coding works well. 56.8% of the 400 test headlines put in to the classifier were classified correctly. That’s pretty amazing considering the large number of categories (27) and the limited training data.

How do things fare when we turn to the (London) Times (6571 headlines over ten years, divided into a 6131 headline training set and a 871 headline test set)? Unfortunately, despite having much more in the way of training data, only 46.6% of articles were classified correctly.

Looks like those puns are not just bad for SEO, they’re also bad for the text-as-data movement…

Update: Mark Liberman suggests (convincingly, IMHO) that the difference is due to headline length.

Italian polling update, November 2013

November 8, 2013

Graphs below. Click on the links to show trend-lines for different parties.

PDL | PD | M5S | Lega | SEL | Scelta Civica

Infrequently asked questions

Where do these polling figures come from? Here and here.

What are these trend lines? They’re estimates from a model which treats latent party support as something which evolves smoothly over time, and is made manifest through particular potentially biased polling snapshots.

Why are some trend lines way above the polls? Evidence from the last elections showed polling companies consistently under/over-estimated some parties. These biases are included in the model.

Italian polling update, October 2013

October 10, 2013

Graphs below. Click on the links to show trend-lines for different parties.

PDL | PD | M5S | Lega | SEL | Scelta Civica

Infrequently asked questions

Where do these polling figures come from? Here and here.

What are these trend lines? They’re estimates from a model which treats latent party support as something which evolves smoothly over time, and is made manifest through particular potentially biased polling snapshots.

Why are some trend lines way above the polls? Evidence from the last elections showed polling companies consistently under/over-estimated some parties. These biases are included in the model.

What would Berlusconi gain from bringing down the government?

September 18, 2013

Tonight a Senate committee will decide on whether Berlusconi must be expelled from the Senate. Berlusconi and members of his party have repeatedly threatened to withdraw from the government if Berlusconi is expelled. How credible is this threat? What would happen if the threat was carried out?

There are a number of possible scenarios:

  • First, either Berlusconi capitulates, and order the PDL to remain in the government, or the PDL withdraws from the government.
  • If the PDL withdraws, then either the government falls immediately, or the government tries to soldier on, daring other parties to call a vote of confidence.
  • If/when the government falls, then either another government forms, or we go to fresh elections.

Another government might form if there are enough defections from the PDL and the M5S. This is probably the only way a government could form.

Let’s consider the path to fresh elections. Perhaps because of Berlusconi’s remarkable ability to come from behind/achieve a better result than polls had predicted, many people consider new elections to be advantageous for the PDL. This, I think, is misguided, partly because of the way the electoral system requires parties to have substantial majorities to win majorities in the Senate.

I tried to calculate the outcome in terms of seats in the Senate if the election were held tomorrow using

  1. the current levels of party support from the pooling-the-polls model;
  2. the parties’ levels of support in each region in the election;
  3. the parties’ levels of support nation-wide in the election
  4. a uniform national swing equal to the difference between (a) and (c)

By adding or subtracting a uniform national swing to (b), we can estimate parties’ vote shares in each region.
We can, with certain assumptions about parties’ coalitions, and support in other regions (TAA, VdA, abroad), estimate their seat shares.
Specifically, I assume that in the event of a coalition collapse, the PD and SEL kiss and make up.
If the election were held tomorrow, then with the parties’ support as it stands, the most likely outcome would be as follows:

  • PDL – 117 seats
  • PD – 79 seats
  • M5S – 50 seats
  • SEL – 19 seats
  • Lega Nord – 16 seats
  • Fratelli d’Italia – 10 seats
  • Lista Monti Senato – 9 seats
  • Destra – 1 seat
  • Other centre-left – 7 seats
  • Other centre-right – 6 seats

The PDL+Lega+FdI+Destra+Others coalition therefore reaches 150 seats, eight short of a majority. It could conceivably form a coalition with the centrists, but this would be difficult. Obviously these results are just indicative of the kind of results that might be reached: I wouldn’t place any great significance on one or two seats either way.

All this is by way of saying that pay-off for the right in the terminal node of this extended game is not great.

Admittedly, there are pay-offs other than seats. Berlusconi might wish just to upset the apple-cart. But, just as we shouldn’t assume that the left would ‘steamroll‘ the right, we shouldn’t assume that Berlusconi threatening fresh elections is necessarily a trump card.

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org