skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bias and excess variance in election polling: a not-so-hidden Markov model
Abstract With historic misses in the 2016 and 2020 US Presidential elections, interest in measuring polling errors has increased. The most common method for measuring directional errors and non-sampling excess variability during a postmortem for an election is by assessing the difference between the poll result and election result for polls conducted within a few days of the day of the election. Analysing such polling error data is notoriously difficult with typical models being extremely sensitive to the time between the poll and the election. We leverage hidden Markov models traditionally used for election forecasting to flexibly capture time-varying preferences and treat the election result as a peek at the typically hidden Markovian process. Our results are much less sensitive to the choice of time window, avoid conflating shifting preferences with polling error, and are more interpretable despite a highly flexible model. We demonstrate these results with data on polls from the 2004 through 2020 US Presidential elections and 1992 through 2020 US Senate elections, concluding that previously reported estimates of bias in Presidential elections were too extreme by 10%, estimated bias in Senatorial elections was too extreme by 25%, and excess variability estimates were also too large.  more » « less
Award ID(s):
2046880
PAR ID:
10593294
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press UK
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series A: Statistics in Society
Volume:
188
Issue:
2
ISSN:
0964-1998
Page Range / eLocation ID:
566 to 582
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Scarano, Stephen; Vasudevan, Vijayalakshmi; Samory, Mattia; Yang, Kai-Cheng; Yang, JungHwan; Grabowicz, Przemyslaw A (Ed.)
    Social media platforms allow users to create polls to gather public opinion on diverse topics. However, we know little about what such polls are used for and how reliable they are, especially in significant contexts like elections. Focusing on the 2020 presidential elections in the U.S., this study shows that outcomes of election polls on Twitter deviate from election results despite their prevalence. Leveraging demographic inference and statistical analysis, we find that Twitter polls are disproportionately authored by male Republicans and exhibit a large bias towards candidate Donald Trump in comparison to mainstream polls. We investigate potential sources of biased outcomes from the point of view of inauthentic, automated, and counter-normative behavior. Using social media experiments and interviews with poll authors, we identify inconsistencies between public vote counts and those privately visible to poll authors, with the gap potentially attributable to purchased votes. We find that election polls tend to be more biased, contain more questionable votes, and attract more bots before the election day than after. We highlight and compare key factors contributing to biased poll outcomes. Finally, we identify instances of polls spreading voter fraud conspiracy theories and estimate that a couple of thousand such polls were posted in 2020. The study discusses the implications of biased election polls in the context of transparency and accountability of social media platforms. 
    more » « less
  2. Polls posted on social media can provide information about public opinion on a variety of issues from business decisions to support for presidential election candidates. However, it is largely unknown whether the information provided by social polls is useful or not. To enhance our understanding of social polls, we examine nearly two thousand Twitter polls gauging support for U.S. presidential candidates during the 2016 and 2020 election campaigns. First, we describe the prevalence of social polls. Second, we characterize social polls in terms of the engagement they elicit and the response options they present. Third, leveraging machine learning models, we infer and describe several characteristics, including demographics and political leanings, of the users who author and interact with social polls. Finally, we study the relationship between social poll results, their attributes, and the characteristics of users interacting with them. Our findings suggest how and to what extent polling on Twitter is biased in terms of content, authorship, and audience. The 2016 and 2020 polls were predominantly crafted by older males and manifested a pronounced bias favoring candidate Donald Trump, whereas traditional surveys favored Democratic candidates. We further identify and explore the potential reasons for such biases and discuss their repercussions. 
    more » « less
  3. While many instructors are aware of the Literary Digest 1936 poll as an example of biased sampling methods, this article details potential further explorations for the Digest’s 1924-1936 quadrennial U.S. presidential election polls. Potential activities range from lessons in data acquisition, cleaning, and validation, to basic data literacy and visualization skills, to exploring one or more methods of adjustment to account for bias based on information collected at that time. Students can also compare how those methods would have performed. One option could be to give introductory students a first look at the idea of “sampling adjustment” and how this principle can be used to account for difficulties in modern polling, but the context is rich in other opportunities that can be discussed at various times in the course or in more advanced sampling courses. 
    more » « less
  4. Abstract Presidential elections can be forecast using information from political and economic conditions, polls, and a statistical model of changes in public opinion over time. However, these “knowns” about how to make a good presidential election forecast come with many unknowns due to the challenges of evaluating forecast calibration and communication. We highlight how incentives may shape forecasts, and particularly forecast uncertainty, in light of calibration challenges. We illustrate these challenges in creating, communicating, and evaluating election predictions, using the Economist and Fivethirtyeight forecasts of the 2020 election as examples, and offer recommendations for forecasters and scholars. 
    more » « less
  5. The prevalence and spread of online misinformation during the 2020 US presidential election served to perpetuate a false belief in widespread election fraud. Though much research has focused on how social media platforms connected people to election-related rumors and conspiracy theories, less is known about the search engine pathways that linked users to news content with the potential to undermine trust in elections. In this paper, we present novel data related to the content of political headlines during the 2020 US election period. We scraped over 800,000 headlines from Google's search engine results pages (SERP) in response to 20 election-related keywords—10 general (e.g., "Ballots") and 10 conspiratorial (e.g., "Voter fraud")—when searched from 20 cities across 16 states. We present results from qualitative coding of 5,600 headlines focused on the prevalence of delegitimizing information. Our results reveal that videos (as compared to stories, search results, and advertisements) are the most problematic in terms of exposing users to delegitimizing headlines. We also illustrate how headline content varies when searching from a swing state, adopting a conspiratorial search keyword, or reading from media domains with higher political bias. We conclude with policy recommendations on data transparency that allow researchers to continue to monitor search engines during elections. 
    more » « less