MRP and opinion polling: a short introduction

In the build-up to the general election on 12th December, you may have heard pollsters and reporters talking about a new-fangled statistical technique called MRP. You might even have heard that this stands for Multi-level Regression and Poststratification, and thought, that sounds complicated. Well, you’re right: it is. Nevertheless, I’ll do my best to explain in broad terms why it’s being used and how it works, alongside a discussion of how accurate it may prove to be.

Before we start with the explanation of MRP, it’s worth considering the challenges inherent to opinion polling.

First, you have to ensure you’re speaking to the right people. You want your sample to reflect the makeup of the UK: containing representative proportions across age bands, regions, education levels, and so on. You might also want to ensure your sample reflects recent voting patterns, most obviously from the 2016 referendum and 2017 general election.

Next, you need to work out how to sample your respondents – online, over the phone, some other way, or a mixture – and how to word the question. You may also need to model how likely each individual is to vote, perhaps based on their claimed certainty and their previous voting habits. Over time, pollsters have each developed and refined their own approaches here.

If you were just aiming to understand overall vote share, that might be enough. However, what makes a general election more complex than a straightforward referendum is the way vote share translates into seats. Under a first-past-the-post constituency system, in which all 650 constituencies elect a single representative, voter concentration matters. In the 2015 general election, for example, UKIP’s 3.9 million votes (12.6% of UK share) translated into just a single seat, while the SNP’s 1.5 million (4.7%) earned them 56. So in looking to forecast the outcome of an election, it’s not as simple as looking at the overall proportion of votes per party.

It’s also worth noting that it’s simply not feasible to run 650 separate polls, i.e. one per constituency. Even at 1,000 respondents per constituency, the margin of error for each would be around 3%, leaving many races too close to call. And that would require sampling 650,000 people, which is simply too difficult, expensive and time-consuming. What’s required instead is a way to turn a nationwide poll into constituency-level results, based on the makeup of the constituency. This is where MRP comes in.

Examination of a polling dataset will reveal that demographics, and recent voting patterns, can to an extent predict voting this time round. The first part of MRP, the multi-level regression, involves constructing a model that predicts the voting intentions for any combination of these variables. For example, it may find that those aged 60-64 who voted Leave in 2016 and Conservative in 2017 have an 80% likelihood to vote Conservative. (This is a simplification; in reality, an MRP model will include far more than just these inputs.)

The second part, the poststratification, is based on constituency-level estimates of the actual proportions in each population subgroup. For example, we may know that 3% of a constituency’s population are 60-64, Leave 2016, Conservative 2017, and by combining this with the 80% above – and doing the same for all the other voter types – we can forecast the total votes within the constituency.

In 2017, YouGov’s MRP model showed a hung Parliament, even while most other pollsters were forecasting a Conservative majority. This time round, their most recent model suggests the Conservatives will win around 339 seats, for a small majority. How confident can we be in this result?

The first thing to note is that a fair few of these seats are close marginals, with just a percentage point or two in them, and YouGov themselves give a margin of error of 311-367 Conservative seats – quite the range, from a hung parliament to a decent majority. Another reason these results may be out is the possibility of a late swing in either direction: something happens after the poll was complete to make significant numbers of voters change their minds. (In fact, last month, YouGov’s previous MRP poll had the Conservatives on 359, suggesting the race has tightened since.) It’s also worth bearing in mind that MRP doesn’t take into account tactical voting, which could affect close races this time round. Finally, polls can themselves affect results; if they feel a Conservative victory is assured, people may vote differently from how they would if they felt the result was in the balance.

But there are other reasons why the numbers might be out. Although YouGov conducted over 100,000 interviews, they will still have had to make judgement calls around sampling method, question wording and voting likelihood modelling, alongside the further statistical assumptions an MRP model requires. For all pollsters’ expertise and experience, every election is different, and even minor errors can lead to very different outcomes.

Pollsters haven’t at the moment the greatest of reputations. In part, this is because they’ve got a couple of recent, prominent, all-or-nothing forecasts – Brexit and Trump – wrong. And while those of us versed in quantitative research can point out that in these cases the polls were in percentage terms almost correct, and their results well within margins of error, it’s understandable that the layperson comes away with the less nuanced view that polling can’t be trusted.

So, if on the 12th the polls are inaccurate again, while statisticians will no doubt pore over the intricacies of the MRP model, and debate its suitability for this kind of forecasting, the general public will be left with their opinions of the industry further diminished. Either way, soon enough, we’ll find out.

Joe