The Science of an Upset

By  Kathryn Schiller Wurster

Donald Trump won the presidency last night, taking the electoral college despite what appears to be Clinton’s narrow win in the popular vote. The results surprised nearly everyone in the media and polling world, who had almost entirely predicted a wide margin of victory for Hillary Clinton. Even Nate Silver’s blog FiveThirtyEight, which has earned a reputation for crunching numbers in exquisite fashion, had Clinton with much better odds throughout most of the race, with the final odds at 70/30 Clinton to Trump the day before the election.

But all the numbers crunchers depend on polls and statistical methods that aren’t reliable and now seem remarkably old fashioned. A Nature article examined this problem in mid-October and blamed the decline of landlines, rise of mobile phones, “shy voter” behavior, and unreliable online polls. At one time in history, calling people on the phone and asking them questions may have been the best way to find out their opinions and predict their likely behavior. But this election has just proved that it doesn’t always work. The UK saw a similar upset against the pollsters’ predictions in the Brexit vote.

The problem is what people say on the phone is likely driven by lots of other factors, especially when the candidates and poll questions are controversial. Conducting phone surveys today also relies on an increasingly outdated mode of social interaction, likely biasing the samples. Online polls likely have their own biases; they also rely on people answering honestly and having a representative sample. In the end, it is clear that asking a small subset of people questions cannot be relied on to give us a real picture of what likely voters are actually going to do.

At the same time, we have more data streams about people, and correlations to their behavior, than ever before. Advertisers can target microgroups based on incredibly detailed demographics. Each of us leave vast trails of data everywhere we go; these trails could be mined to answer all the questions pollsters ask (and likely much more). Social network analysis should be able to tell us who the influencers are and measure their impact on the outcomes.

Now we need a team of statisticians and big data analysts and marketing gurus to look back at trends in data from a wide range of sources in the lead-up to the election. We need a forensic investigator to try to find correlations and trends that we missed along the way and connect the dots that led us here. The margins were narrow, so it may be that – for now – the degree of uncertainty we have to accept is still greater than the margin of error in the actual results. But we should be able to do better than this.