Tuesday, July 16, 2024

About statistics in opinion polls

Over at r/fivethirtyeight, there is an explanation of statistics in poll data. There will be a quiz .
So I feel like some people have been using the concept of the "margin of error" in polling quite the wrong way. Namely some people have started to simply treat any result within the margin of error as functionally equivalent. That Trump+3 and Biden+3 are both the same if the margin of error is 3.46.

Now I honestly think this is a totally understandable mistake to make, both because American statistics education isn't great but also unhelpful words like "statistical ties" give people the wrong impression. 
What the margin of error actually allows us to do is estimate the probability distribution of the true values - that is to say what the "actual number" should be. To illustrate this, I've created two visualizations:

Here is the probability of the "True Numbers" if Biden lead 40-37


And here is the probability of the "True Numbers" if Trump lead 40-37


Notice the substantial difference between these distributions. The overlapping areas represent the chance that the candidate who's behind in the poll might actually be leading in reality. The non-overlapping areas show the likelihood that the poll leader is truly ahead.

In the both of the polls the overlapping area is about 30%. This means that saying "Trump+3 and Biden+3 are both within the 3.46% margin of error, so they're basically 50/50 in both polls" is incorrect.
A more accurate interpretation would be: If the poll shows Biden+3, there's about a 70% chance Biden is truly ahead. If it shows Trump+3, there's only about a 30% chance Biden is actually leading. This demonstrates how even small leads within the margin of error can still be quite meaningful.
A peanut in the gallery commented:
A more accurate interpretation would be: If the poll shows Biden+3, there’s about a 70% chance Biden is truly ahead. If it shows Trump+3, there’s only about a 30% chance Biden is actually leading. This demonstrates how even small leads within the margin of error can still be quite meaningful.

Yeah, but only if the sample population is reflective of the total population. One of the biggest issues with political polling is actually getting a representative sample since we don’t know with 100% certainty what population will actually show up on Election Day. I suppose that’s a bit pedantic but the margin of error doesn’t really account for an inaccurate sample population, which is more likely to be where the source of error in political polling is coming from.

Got it? Me neither. Quiz time!

The election is a horse race.

No comments:

Post a Comment