<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The Swing Voter Paradox: Electoral Politics in a Nationalized Era</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10392337</idno>
					<idno type="doi"></idno>
					<title level='j'>Ph.D. Dissertation, Department of Government, Harvard University</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Shiro Kuriwaki</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[With each successive election since at least 1994, congressional elections in the United States have transitioned toward nationalized two-party government. Fewer voters split their tickets for different parties between President and Congress. Regional blocs and incumbency voting --- a key feature of U.S. elections in the latter 20th century --- appear to have given way to strong party discipline among candidates and nationalized partisanship among voters. Observers of modern American politics are therefore tempted to write off the importance of the swing voter, defined here as voters who are indifferent between the two parties and thus likely to split their ticket or switch their party support.   By assembling data from historical elections (1950 -- 2020), surveys (2008 -- 2018), and cast vote record data (2010 -- 2018), and through developing statistical methods to analyze such data, I argue that although they comprise a smaller portion of the electorate, each swing voter is disproportionately decisive in modern American politics, a phenomenon I call the swing voter paradox. Historical comparisons across Congressional, state executive, and state legislative elections confirm the decline in aggregate measures of ticket splitting suggested in past work. But the same indicator has not declined nearly as much in county legislative or county sheriff elections (Chapter 1). Ticket splitters and party switchers tend to be voters with low news interest and ideological moderate. Consistent with a spatial voting model with valence, voters also become ticket splitters when incumbents run (Chapter 2). I then provide one of the first direct measures of ticket splitting instate and local office using cast vote records. I find that ticket splitting is more prevalent in state and local elections (Chapter 3). This is surprising given the conventional wisdom that party labels serve as heuristics and down-ballot elections are low information environments.   A major barrier for existing studies of the swing voter lies in the measurement from incomplete electoral data. Traditional methods struggle to extract information about subgroups from large surveys or cast vote records, because of small subgroup samples, multi-dimensional data, and systematic missingness. I therefore develop a procedure for reweighting surveys to small areas through expanding poststratification targets (Chapter 4), and a clustering algorithm for survey or ballot data with multiple offices to extract interpretable voting blocs (Chapter 5). I provide open-source software to implement both methods.  These findings challenge a common characterization of modern American politics as one dominated by rigidly polarized parties and partisans. The picture that emerges instead is one where swing voters are rare but can dramatically decide the party in power, and where no single demographic group is a swing voter. Instead of entrenching elections into red states and blue states, nationalization may heighten the role of the persuadable voter.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>tury -appear to have given way to strong party discipline among candidates and nationalized partisanship among voters. Observers of modern American politics are therefore tempted to write off the importance of the swing voter, defined here as voters who are indifferent between the two parties and thus likely to split their ticket or switch their party support.</p><p>By assembling data from historical elections <ref type="bibr">(1950 -2020), surveys (2008 -2018)</ref>, and cast vote record data <ref type="bibr">(2010 -2018)</ref>, and through developing statistical methods to analyze such data, I argue that although they comprise a smaller portion of the electorate, each swing voter is disproportionately decisive in modern American politics, a phenomenon I call the swing voter paradox. Historical comparisons across Congressional, state executive, and state legislative elections confirm the decline in aggregate measures of ticket splitting suggested in past work. But the same indicator has not declined nearly as much in county legislative or county sheriff elections (Chapter 1). Ticket splitters and party switchers tend to be voters with low news interest and ideological moderate. Consistent with a spatial voting model with valence, voters also iii become ticket splitters when incumbents run (Chapter 2). I then provide one of the first direct measures of ticket splitting in state and local office using cast vote records. I find that ticket splitting is more prevalent in state and local elections (Chapter 3). This is surprising given the conventional wisdom that party labels serve as heuristics and down-ballot elections are low information environments.</p><p>A major barrier for existing studies of the swing voter lies in the measurement from incomplete electoral data. Traditional methods struggle to extract information about subgroups from large surveys or cast vote records, because of small subgroup samples, multi-dimensional data, and systematic missingness. I therefore develop a procedure for reweighting surveys to small areas through expanding poststratification targets (Chapter 4), and a clustering algorithm for survey or ballot data with multiple offices to extract interpretable voting blocs <ref type="bibr">(Chapter 5)</ref>. I provide open-source software to implement both methods. These findings challenge a common characterization of modern American politics as one dominated by rigidly polarized parties and partisans. The picture that emerges</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>| Acknowledgments</head><p>Ten years ago in the Spring of 2011, I was taking my first statistics class, my first lecture for Introduction to American Politics, and my first political science seminar. I owe all that I have learned since that semester, including this dissertation, to the people who have helped me and enriched my life along the way.</p><p>My main dissertation advisors were formidable. From Steve Ansolabehere, with whom I took my first American Politics class at Harvard in my first semester, I learned how to formulate long term research while being excited to come to work every day.</p><p>From Kosuke Imai, I learned to push myself just the right amount to meet higher standards and the importance of helping fellow students while learning immensely from them in turn. From Jim Snyder, I learned how to spot an interesting research question and go about turning that into a paper, and an appreciation for hours of data collection to find a single statistic. From all three of them, I learned how to think clearly and write elegantly.</p><p>Over the course of my time at Harvard, I also benefited from the advice from many other faculty and teachers, including Matt Blackwell, Amy Catalinac, Ryan Enos, Andrew Hall, Gary King, Dan Levy, Xiao-Li Meng, Susan Pharr, Jon Rogowski, Maya Sen, Dan Smith, Teddy Svoronos, Dustin Tingley, and Ista Zahn. Even in their busiest moments, their comments led to one discovery after another.</p><p>One of the best experiences as a graduate student at the Government department was the time spent with amazing peers. I learned immensely from Pam Ban, Peter x <ref type="bibr">Bucchianeri, Dan De Kadt, Connor Huff, Chris Lucas, Michael Morse, Daniel Moskowitz,</ref> and Melissa Sands, who all guided my path as I entered the program. Those who entered at the same time or after me also were sources of inspiration and fun, including Jake Brown, Chris Chaky, Angelo Dagonel, Naoki Egami, Shusei Eshima, Max Goplerud, Jaclyn Kaslovsky, Chris <ref type="bibr">Kenny, Hanno Hilbig, Michael Olson, Yon Soo Park, Hunter Rendleman, Albert Rivero, Tyler Simko, Diana Stanescu, and Michael Zoorob.</ref> Outside of the department I had the good fortune of knowing Alexander Agadjanian and Shun Yamaya. I am especially thankful to Andy Stone and Soichiro Yamauchi, whose friendship kept me going on more days than they probably realized. Conversations with both lightened up any day. In addition, two of the chapters in this dissertation would not have been possible without Soichiro's substantial help.</p><p>In my personal and pre-graduate school life, I would first like to thank Susan Fiske, my undergraduate advisor and one of my academic heroes. Christina Davis, who taught that first political science seminar and wrote my first letter of recommendation, has been a generous advisor. I am grateful for Youngsuk Chi for believing in my future in academia and life in the United States. After graduation I was lucky to join the Analyst Institute, through which I started to learn from Jonathan Robinson, Josh Rosmarin, Aaron Strauss, and Annie Wang, and others. Special appreciation goes to Yi Yang for her warm support as well as to Jacky Cheng and Chris Cochran, with whom I have effectively lived together for nearly a decade. I credit Chris for many of the life habits that helped me stay health and happy through graduate school.</p><p>I have failed to list many others who have helped me through this process. My final and deepest thanks goes to my family, especially my mother. Her commitment to my education and personal happiness came with sacrifices to her time and career, I cannot overemphasize my appreciation of how much that has shaped my trajectory into graduate school and beyond.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>xi</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>| Introduction</head><p>Changes in the vote choice of a small subset of the electorate can dramatically swing election results, making voters who deviate in any way from a straight party ticket a perennial interest for social scientists and political campaigns alike. This volatile electorate are often called swing voters. Understanding the prevalence and characteristics of these swing voters is a cornerstone to understanding why strategic parties and re-election seeking politicians take the positions they do. This dissertation collects data from multiple sources -historical election results, survey data, and cast vote records -to provide a sytematic and in-depth accounting of the prevalence and voting pattenrs of swing voters. 1 Of particular interest is whether (or in which eras and elections) the groups of voters I identify as swing voters is consequential for who wins an election.</p><p>In this introductory chapter, I provide a definition for the swing voter and a theoretical rationale for how that latent concepts relates to observable behavior such as ticket splitting and party switching. Without a clear definition, a "swing voter" is a rather elusive term that social science disciplines and journalistic coverage define in differing ways. For example, some include decisiveness as part of the definition of the swing voter, while in other definitions a swing voter is a easily persuadable voter regardless of whether they are pivotal or not. I define the swing voter in a spatial voting framework where voters choose candidates by comparing the utility they derive from 1 Data and code to replicate some of this analysis is provided in <ref type="bibr">Kuriwaki (2021c)</ref>.</p><p>1 candidates, whose positions are defined on a left-right scale.</p><p>The notion of swing voters has been a focus in the study of electoral behavior and in theoretical models of political economy <ref type="bibr">(Burden and Kimball 2002;</ref><ref type="bibr">Persson and Tabellini 2000, ch.8)</ref>. Indeed I argue that this simple framework is appropriate because the definitions are clear, widely applicable, and illuminating in its own right.</p><p>The framework justifies the use of ticket splitting and party switching as the key behavior I study empirically in the rest of this dissertation. Being a swing voter is a latent quality, but these two voting patterns are logical implications of a swing voter in a general spatial voting framework.</p><p>This formalization clarifies the difference with other ways scholars have defined the swing voter, as well as where the logic overlaps. One of the latest formalizations of the swing voter is by <ref type="bibr">Mayer (2008)</ref>, who focused on identifying the swing voter in the context of Presidential elections. His definition relies on a survey instrument often called the feeling thermometer, where respondents indicate how favorable or unfavorable they feel towards each presidential candidate. Mayer operationalizes the swing voter as voters who give the exact or approximately same value for both candidates, and argues that other related measures, such as party switching, being undecided, or independents, are less desirable mainly for measurement and interpretability. In a similar spirit, <ref type="bibr">Hillygus and Shields (2009)</ref> investigated the characteristics of the persuadable voter, operationalized as a political independent or having issue positions that conflict with the party platform.</p><p>While the general strategy of this body of work is reasonable, taken literally its generalizability is limited. For example, both focus on a single office and the theoretical framework provides only suggestive guidance about modeling voting across multiple offices. The measurement strategy of a feeling thermometer is of limited applicability; most political surveys measure vote choice and partisanship but few consistently provide a feeling thermometer. And while, as Mayer argues, other measures such as being a moderate or undecided have their own measurement challenges, a black-orwhite statement about whether these are correct ways to measure the swing voter makes it difficult to aggregate evidence across multiple classic studies, including V.O.</p><p>Key's final work on floating voters <ref type="bibr">(Key 1966)</ref>.</p><p>In what follows, I provide a definition and measurement strategy for swing voters in a simple spatial voting framework. The general idea of the definition is consistent with what scholars such as <ref type="bibr">Mayer (2008)</ref> have outlined, i.e. that swing voters are generally indifferent to the two alternatives. The spatial voting framework posits a leftright ideological spectrum and voters who make choices based on a cardinal measure of utility. This utility framework is in fact quite similar to the feeling thermometer measure. There is less conflict between my definition and Mayer as it might seem at first. The spatial voting framework can flexibly incorporate factors other than ideology. Here I consider two: a valence term and uncertainty.</p><p>In sum, the spatial voting framework turns out to be a simple and fruitful model to operationalize the concepts and justify why ticket splitting is a reasonable measure to operationalize the swing voter. As with all models, the goal of the model is not to fully predict behavior or describe the psychological process by which voters make the choices they do. Yet the model is illuminating because it logically shows how being a swing voter relates to ticket splitting or party switching. Perhaps more useful is that it also shows how the propensity for a voter to split a ticket is increasing in the ratio of two well-known factors in electoral politics: a candidate's valence advantage and the spatial distance, or polarization, between the two candidates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Notation and Setup</head><p>The classic model of vote choice serves as a fundamental building block in defining a swing voter. Here voters are indexed by i, and have ideal points z i on a left to right continuum. Candidates are members of a party a &#8712; {l, r} and are indexed by their office j. They have two attributes: their policy position x a j and valence v a j . Valence is any attribute that all voters prefer more of to less. In the US context, it may include incumbency and relevant experience for the job. We consider a case where candidates are nationalized, where we can simplify x a j = x a j &#8704;a, j, j. In other words, candidates of the same party have the same spatial position but may still have different valences.</p><p>This setup is non-strategic and non-dynamic. The main finding of valence models is not new <ref type="bibr">(Ansolabehere and Snyder 2000;</ref><ref type="bibr">Groseclose 2007</ref>) but this description applies its insights to the case of ticket splitting.</p><p>There is only one player: the voter i. Voters have ideal point x * i &#8712; R. They make binary choices between candidates in J offices. Candidates are members of a party a &#8712; {l, r} and run for one office j. Candidates have two fixed attributes: their policy position x a j &#8712; R and valence v a j &#8712; R. Both their policy and valence are fixed, for example by constraints in the primary election or national politics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definiton</head><p>Following convention I define a swing voter as voters for whom U i (z, x l ) &#8776; U i (z, x r ).</p><p>In other words, these are voters who are largely indifferent between two parties. Indifference could also be defined with respect to particular candidates, such that a voter places equal utility for both candidates (their valence baked in). But formal theory models of the swing voter typically reserves such non-spatial attributes as potential shocks. They can include characteristics specific to each candidate or the distributional consequences of a particular party winning power, which all induce variation and uncertainty in election outcomes.</p><p>By this definition I distinguish between a swing voter and a pivotal voter. Some articles use "swing voter" to mean a voter that is both indifferent between the two choices and pivotal, in that they are the median voter and cast the deciding vote <ref type="bibr">(Pesendorfer and Feddersen 1996)</ref>. The swing voter in the definition I adopt need not be pivotal, and examining how swing and pivotality interact often leads to important insights (see e.g. <ref type="bibr">Enns and Wohlfarth 2013</ref>, in the case of the Supreme Court). In chapters 1 and 2, I study how often the group of swing voters can be pivotal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model of Vote Choice</head><p>Each of a voter's outcome y ij &#8712; {l, r} refers to a party choice by voter i in office j. Voters prefer candidates with closer policy positions, but they may also prefer more valence over less. Therefore in contest j a voter considers a quadratic utility for each party a with random measurement error:</p><p>The key parameter of interest is &#952; &#8712; R, which indicates the weight voters value valence relative to party. If &#952; = 0, then voters only ignore valence and only vote party.</p><p>If &#952; &gt; 0, some voters may defect from their party allegiance to vote for a high quality candidate. For simplicity &#952; is left constant for all voters, but we now let it vary by i in order to identify the model (see end of this section).</p><p>Decision Rule For tractability let both errors have a Normal distribution with the same mean, and let the variance of the difference of the two distributions be 1, i.e.,</p><p>( ijr ij ) &#8764; Normal(0, 1).</p><p>Then voter i votes for the Democrat (candidate l) in office j if</p><p>where &#934;(&#8226;) is the cumulative density function of a standard Normal distribution. In other words, a voter's choice for a particular election depends on the spatial differential &#8710;x j &#8801; x l jx r j combined with the cutpoint &#954; j &#8801; (x l j +x r j ) 2</p><p>, a valence differential &#8710;v j &#8801; v l jv r j :</p><p>Valence differentials can therefore lead voters to split their ticket, which in this model corresponds to choosing a candidate that would not have been chosen based on the spatial cutpoint &#954; j . The impact of valence is determined by its size relative to the spatial differential. To see this, consider the one-dimensional case and solve for Pr(y ij = l) &gt; 0.5. Rearranging terms and assuming x l j &lt; x r j without loss of generality, Vote for l if:</p><p>The intuition for this result is that considering valence moves the cutpoint in l's favor from the original (&#954; j ) by an additive factor that is increasing in &#952; i and (v l jv r j ), and decreasing in the positive difference (x r jx l j ). The figure below sketches out an example where the voter chooses l only because of l's valence advantage, with the contribution of valence highlighted in blue.</p><p>x r j Voter type x * i that votes l (with valence) (without valence)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Valence as Clarity of in Ideal Points</head><p>We can also consider how uncertainty factors into this decision by treating the candidate position x a j as a random variable. For simplicity consider the one-dimensional case and let E(x a j ) = &#181; a j , Var(x a j ) = &#951; a j &gt; 0.</p><p>Then the voter's expected utility from party a, E(U i (x a j , y a j )), becomes</p><p>Therefore, a large variance &#951; a j can be interpreted as having the opposing effect as high valence &#952;v a j . In other words, high uncertainty in policy position effectively lowers valence.</p><p>Connection with Ticket Splitting and Party Switching Thus far, I have limited the exposition to a case where a voter makes one office, rather than voting on the long ballot. The theoretical results make the extension straightforward if we fix the national, top of the ticket office to be polarized and contested by candidates equally matched in valence. In essence, vote choice is a function of the relative difference between candidates, not offices. With a simialr logic, we can capture party switching overtime in this model by considering offices at time 0 and time 1 as two choices the same voter makes.</p><p>In the figure below, we show three cases: the first case repeats the single-office above, which can be thought of as a local office. In the second case, we introduce a "national" office in j = 1 where candidates are equally separated and assumed to have equal valence.</p><p>In the ticket splitting case, the voter's ideology x * i remains the same across offices, because the voter is voting for the two offices in the same ballot. As before, the tickmark is the cutpoint between candidates (&#954;) and the blue region indicates the type 0, &#952;(v l j -v r j ) 2(x r j -x l j ) which is all the possible values of x * i under which he will vote for the Left candidate. Local Office (j = 2)</p><p>National Office (j = 1) Local Office (j = 2)</p><p>The figure reinforces the point that ticket splitting for a candidate a is increasing in the valence advantage of candidate a and decreasing in the degree of moderation (or convergence) between the two candidates.</p><p>To model party switching, we can replace the diagrams above with offices at two different time periods. An additional moving piece in the party switching model is that the voter's ideal point itself may shift over time. This adds an additional degree of complication, but still can be thought of in the same framework.</p><p>Candidates who run in local offices may be different from national elections because there is less information about them in the media. Recall that uncertainty in the information about candidate b's position also effectively enters the decision rule as negative valence for candidate b. This implies that candidate a being more well-known (perhaps through news coverage, advertising, incumbency) than candidate b in a local office also draws ticket splitting towards candidate a. All this can happen even if voters cared equally about party across offices (&#952; is constant across offices) and candidate's positions were nationalized (x l j = x l j for all j, and the same for r).</p><p>Identification Although I do not attempt to estimate the parameters of the model from the data in this dissertation, it is worth noting how that would be possible. On its own, this model is not identified; an additional constraint is needed to distinguish between the contribution of the valence differential and the spatial differential in the data. To see why, following <ref type="bibr">Bailey and Maltzman (2008)</ref>, consider adding and subtracting some arbitrary &#952;&#8710;y j to the right-hand side of equation 2. Then we would be able to rewrite the equation as &#934; 2(&#8710;x j ) (z i&#954;j ) + &#952;(&#8710;v j ) where &#954;j = &#954; j + &#952;&#8710;v j &#946; j and &#952; = (&#952; i -&#952;), making it indistinguishable from the original equation despite having a different coefficient on &#8710;v j . Bailey and Maltzman get around this problem by adding Members of Congress to the dataset and assuming that their equivalent of &#952; is 0 for those individuals. In my case, I can take the voters who choose the party lever (and therefore do not consider candidate-specific valence) and set &#952; i = 0 for those individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Operationalization</head><p>We cannot observe a voter's utility for each alternative, let alone the specific component of the utility that is about the party bundle. At best, we only observe the revealed preference of the voter. Scholars have used two measures in particular: vote switching and ticket splitting. Both measures involve the same voter voting for both parties, either across time, across office, or both.</p><p>Party Switching: Surveys and panels of respondents can ask respondents to recall their party choice for two different elections, often spaced across time. This corresponds to the common definition of swing districts. V.O. Key's seminal study of electoral change focuses on this operationalization as well. Through analyzing a series of national surveys <ref type="bibr">Key (1966)</ref> showed how voters transitioned between the Republican and Democratic presidential candidates during 1940 -1960, finding that as much as 15</p><p>to 20 percent of voters in this era switched parties to align their vote with their policy views on key issues of the day. This is both an observable implication and a operationalization of being a swing voter. If the same person switches parties, that suggests voters derive similar utilities from both parties generically.</p><p>Ticket Splitting: The pair of elections to be compared can also be across offices within the same election. The American federalist system abounds with direct elections of various elections. The majority of these are partisan elections (i.e. candidates have party labels on the ballot) in the same general election cycle. <ref type="bibr">Beck et al. (1992)</ref> examine ticket splitting in state executive offices and <ref type="bibr">Burden and Kimball (2002)</ref> examine ticket splitting in the US House election. But as noted by <ref type="bibr">Burden and Helmke (2009)</ref>, a general definition of ticket splitting applies to what others will call vote switch-ing.</p><p>Persuadability is another operationalization of the swing vote that conforms with one way the term is used colloquially. In this definition, a voter is a swing voter if they are one of the first voters to switch their vote under the counterfactual that some factor, like candidate valence, candidate ideology, or national information shocks, changed.</p><p>Unfortunately, this counterfactual quantity is difficult to measure reliably. Work in estimate heterogeneous treatment effects have been applied successfully in turnout <ref type="bibr">(Imai and Strauss 2011)</ref>, but less in candidate vote choice, perhaps because effects are noisy and experimental samples for vote choice are smaller than turnout experiments.</p><p>Coppock <ref type="bibr">, Hill, and Vavreck (2020)</ref> report a large collection of survey experiments to examine heterogeneity and subgroup effects, and report little heterogeneity in persuadability. However, their settings focus on the 2016 Presidential election and therefore do not test variation in candidate attributes. Other machine learning methods may be able to reliably identify subgroups with high treatment effects, but this is an evolving methodological field <ref type="bibr">(K&#252;nzel et al. 2019</ref>) and beyond the scope of most of the evidence presented in this dissertation. differed from the Presidential voteshare by 2 to 3 percentage points, the proportion of ticket splitting this implies was large enough to have reversed party control in Congress. I also find that statewide executive and state legislative elections trend the same as Congress. However, gubernatorial and countywide elections do not show the same trend, or have larger discrepancies from the Presidential vote. This suggests that the swing voter is not a single bloc, but varies by the office and candidate. * I thank Jim Snyder, Gary Jacobson, Carl Klarner, Steven Rogers, Michael Zoorob, Chris Warshaw, and Justin de Benedictis-Kessner for sharing their electoral data, most of it unpublished or before public release. I also thank those at Daily Kos for their values of Presidential voteshare at legislative districts that enable many of these comparisons.</p><p>All states in the 2016 U.S. Senate Election produced "straight-ticket" delegations with Presidential results. In all states where a Republican candidate won the U.S.</p><p>Senate race, Donald Trump, the Republican, won. And in all states where a Democratic candidate won the U.S. Senate race, Hilary Clinton, the Democrat won. The 2020 elections were similar. Except for Senator Susan Collins (R-ME) winning reelection, no state with a U.S. Senate election that year delivered a split delegation between President and Senate. Earlier, <ref type="bibr">Jacobson (2015)</ref> noted that "[w]ith little fanfare, the electoral advantage enjoyed by U.S. representatives has fallen over the past several elections to levels not seen since the 1950s," a pattern of a declining personal vote that would be consistent with congressional election results increasingly mirroring Presidential support.</p><p>Observers took this as evidence that ticket splitting was less consequential, that allegiance to national parties now dominated electoral behavior, and differences across candidates and offices were increasingly minor <ref type="bibr">(Stein 2016;</ref><ref type="bibr">Enten 2016)</ref>. This became a pattern for the next few elections in 2018 and 2020 <ref type="bibr">(Skelley 2018;</ref><ref type="bibr">Rakich and Best 2020)</ref>. Moreover, the decline in ticket splitting has co-occurred with the increase in polarization between parties' voting behavior in Congress. Putting these trends together, one might expect highly disciplined political parties with extensive degrees of partisanship found among voters, cutting across different levels of government (Drutman 2018).</p><p>In this chapter I document and interrogate these overtime trends from the lens of the swing voter. Disentangling the contribution of parties, candidates, and voters from a set of election results is a source of numerous debates in political science <ref type="bibr">(Fiorina and Abrams 2008;</ref><ref type="bibr">Ansolabehere, Snyder, and Stewart 2001)</ref>. Large electoral trends that decide which parties and politicians win elections and set policies should be traceable to specific blocs of voters. But simple statistics often mask this inter-pretation. For example, the statistic that "no states split their Senate-presidential vote for the first time ever" <ref type="bibr">(Stein 2016</ref>) is only loosely indicative of the prevalence of ticket splittings in the electorate because it only accounts for who won an electionmasking the margins.</p><p>Another potential pitfall when inferring voting behavior from these election returns in Congress is that it generalizes across offices too easily. The United Stats features a "long ballot" <ref type="bibr">(Key 1963)</ref>, where offices ranging from President to Sheriff to County Council are up for re-election at once, often affiliated with a major party. Have ticket splitting rates in these state and local elections also declined? Because of the limited availability of state and local election data, only recently have scholars tracked rates across the long ballot.</p><p>Through my analysis, I argue for two modifications to this common interpretation of modern elections. First, while ticket splitting indeed has likely declined, this does not mean that ticket splitting has become electorally irrelevant. In fact, the modern era of nationalized Congressional elections is also an electoral politics where ticket splitting has even more disproportionate electoral influence to determine the party control over elections. I label this the swing voter paradox because one might think that as swing voters decline, electoral results become more predictable. The basic idea of this pattern is that ticket splitting has declined but parties have been equally balanced, and, as we see in Chapter 2, ticket splitting is not concentrated in any particular district. Unstable majorities have been highlighted in past work as well, but my focus on the low rates of ticket splitting does not hinge on the argument (as in Fiorina 2017) that a large portion of voters are moderate.</p><p>Second, I collect electoral data in offices other than Congress, and show varying degrees of decline. If swing voters are a single group that is indifferent to either party, one might expect to see ticket splitting for all offices declining across for state and lo-cal offices as well. I do find decline in ticket splitting rates in offices such as Governor elections and statewide legislative elections, but the drop is not as precipitous. Moreover, the evidence is inconclusive in county offices such as Sheriff and county council. There is value in comparing electoral results from different offices because it can start to disentangle a secular trend that applies to all voters, or particular aspects of the candidates and the districts they run in (for a similar design, see <ref type="bibr">Ansolabehere and Snyder 2002)</ref>. Under the spatial voting framework with valence considerations, I would expect that whether or not a voter casts a split ticket is not something inherent to the voter, but something that varies across offices and candidates.</p><p>The goal of this chapter is to be broad and represent electoral behavior with a common metric that can be compared across different offices and different decades.</p><p>On the other hand, the simple measure has its flaws. I use the difference in a pair of vote shares to measure the prevalence of swing voters. This is a classic ecological inference problem. In the two-party case I focus on here, the difference in two vote shares almost certainly underestimates the proportion of voters who split their ticket in the population. Nevertheless, by using a simple measure allows for a more transparent comparison across different datasets. Moreover, because the direction of mismeasurement is largely known, the evidence is sufficient to make an important observation in support of the paradox I propose: Even by a conservative estimate, the proportion of ticket splitters in Congressional elections in the modern era has been sufficient to reverse the party control of both chambers of Congress.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Data and Measures</head><p>To provide a broad picture of ticket splitting, I collected and pooled together existing datasets to cover as wide a set of offices and years as possible.   <ref type="bibr">Ansolabehere and Snyder (2002)</ref> and <ref type="bibr">Eggers et al. (2015)</ref>. State legislative elections are provided by <ref type="bibr">Rogers (2021)</ref> and <ref type="bibr">Klarner (2021)</ref>, with Presidential vote at the state legislative district computed by Rogers or the organization Daily Kos. For county Sheriff elections, I rely on a dataset of historical Sheriff elections collected by <ref type="bibr">Zoorob (2020)</ref>. For county legislative elections, I rely on an ongoing data collection effort by de Benedictis Kessner and Warshaw, which collect results from municipal and county election results from 2000 to 2018, including those published in de Benedictis-Kessner and Warshaw (2020). I do not use municipal elections in this analysis because most of these offices are nominally non-partisan. That is, these candidates do not participate in a party primary to appear on the general election ballot, and party affiliation is not listed on the ballot. For any election for office j in constituency i, I compute the simple vote share difference from the Presidential vote,</p><p>where D ij indicates the number of votes for the Democratic candidate in constituency i for office j, and R ij indicates the number of votes for the Republican candidate. The overall goal of this section is to provide good enough measures for as wide a historical range as possible.</p><p>Throughout this section, I use the Presidential vote as the reference office because the President -Vice President ticket is the only office elected by the entire county.</p><p>It is also conveniently comparable: always contested by the two parties, giving each voter in every state the same two choices. Because a goal of the analysis is to compare different congressional, state, and local offices with each other, I fix the reference category in all these comparison of pairwise differences. There are exceptions to this general pattern: strong third party candidates in 1960, 1968, and 1992 make the twoparty presidential vote share in those states harder to interpret.</p><p>Therefore, high values of the Vote Share Difference indicate that one candidate has outperformed expectations relative to how a national candidate in the Presidential race did in that same district. The lowest value of 0 indicates that the Democratic and Republicans got exactly the same two-party voteshare as their respective Presidential candidates. For an example of this measure used in the literature, see <ref type="bibr">Darr, Hitt, and Dunaway (2018)</ref>. <ref type="bibr">Moskowitz (2020)</ref> shows that this measure correlates highly from actual ticket splitting rates estimated from ballots.<ref type="foot">foot_0</ref> </p><p>Of course, even if it is correlated with the quantity of interest, the vote share difference measure suffers from an ecological inference problem in measuring the degree of ticket splitting or party switching. The three simplified cases in Figure <ref type="figure">1</ref> The subsequent chapters improve upon this vote share difference by using individual- level data such as surveys and cast vote records. The simple vote share is nevertheless useful for historical comparisons where survey data do not exist. Moreover, in the twoparty case, the direction of the measurement error is known. Even this lower bound measure is sufficient to demonstrate the point that there are enough ticket splitting voters to reverse the party control in modern Senate and House elections. Because in a midterm year there is no Presidential election and therefore no single pair of candidates that the nation as a whole votes on, I use the Presidential election  <ref type="bibr">House, 1952</ref><ref type="bibr">House, -2020) )</ref> result two years prior in the same constituency. This introduces some complications in interpretation. Voters loyal to the President's party may stay at home in a midterm year, hurting that party's Democratic Senate candidate up for election rather than reflecting an increase in ticket splitters in the population. That is at least partially why the average difference is higher in midterm elections than presidential elections. Note: Each line represents different ways to measure Independents. The gray line is a wider measure that includes Independents who lean towards a particular party.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">The Rise and Decline of Ticket Splitting in Congress</head><p>The black lines estimate the proportion of "pure" Independents who do not lean towards a particular party. The dotted line restricts the sample to the turnout electorate. The trends do not correspond to the rise and fall of ticket splitting in Congressional Elections.</p><p>Source: Pew Research Center (2015), <ref type="bibr">ANES Cumulative File (1948</ref><ref type="bibr">-2016)</ref>, using the variable VCF0305. For 2020, I use the preliminary release of the time series dataset.</p><p>of traditional parties <ref type="bibr">(Klar and Krupnikov 2016)</ref>. A better measure that isolates the indifference to parties on the voter-side may be the Independents that do not lean to- The next important time period in the history of the two parties was the 1980 election. <ref type="bibr">Lee (2016)</ref> traces the polarization of parties to this time period and specifically this election, where Republicans unexpectedly won a majority of the U.S. Senate.</p><p>At this time, the Democrats still appeared to have a solid grasp of the U.S. House.</p><p>But once the control of a chamber was in play, Lee argues, party leaders had additional incentive to differentiate their platforms from the other party. Congressional</p><p>Republicans therefore increased their use of messaging bills and Congressional hardball instead of emphasizing consensus building. The strategy also required the entire caucus to stick to the party line. The legislative dynamics that Lee traces fits the trends in electoral results rather well. As Congressional candidates repositioned themselves to align more closely with the President's party, the spatial voting framework would predict that voters will accordingly be less likely to split their ticket.</p><p>Similar trends show up in the rise and fall of the incumbency advantage in the U.S. House. Various estimates suggest that by the 1980s, incumbents gained an additional 8-10 percentage points above and beyond the normal party allegiance <ref type="bibr">(Levitt and Wolfram 1997)</ref>, that this increase was similar in state executive and state legislative office as well <ref type="bibr">(Ansolabehere and Snyder 2002)</ref>, and the advantage slowly declined thereafter. The incumbency advantage is closely tied to ticket splitting because it often involves some voters defecting from their general party identification to vote for the incumbent, which happens to be of a different party <ref type="bibr">(Jacobson 2015)</ref>. It is therefore hard to say whether the declining incumbency is a cause of ticket splitting. A declining incumbency advantage is often a manifestation of an increase in ticket splitting, which can be seen both in overall trends, the theoretical model outlined in the introduction of this appendix, and the results in the rest of this dissertation.</p><p>Other work points to the realignment of the former Confederate states, increased centralization of power to the federal government, and the nationalization of the media as contributing factors to the decline in ticket splitting and the nationalization of politics <ref type="bibr">(Hopkins 2018)</ref>. Testing each of these factors is beyond the scope of this chapter on electoral politics. From a broad view of Congressional elections reviewed here so far, a change in party brands appears to be an important factor. This trend </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">The Closeness of Elections</head><p>For the rest of the analysis, it would be useful to set some benchmarks about how much of a swing vote is large enough to be politically important. I propose one benchmark: the magnitude of uniform swing it takes to reverse the party that wins the most seats in the legislature.</p><p>That is, the "minimum swing", &#948; for a given U.S. House election is given by</p><p>where s &#8712; {1, ..., S} indexes seats, V s indicates the two-party vote share in seat s of the party that ultimately won the majority of seats in the election. Therefore, a &#948; = -0.01 indicates that a 1 percentage point uniform swing against the majority party would have cost the winning party enough seats to cost them the majority. Another way to think of this value is the majority party's lead in the tipping point district.</p><p>Only a third of U.S. Senate seats up for election in a general election, so the formula for swing in the Senate takes into account the number of seats that the winning party already has locked in and not defending. The seats required for a majority also depend on the party of the Vice President, who casts a tie-breaking vote. Because every marginal seat matters for the majority, independents are coded as essentially belonging to the party they caucus for.<ref type="foot">foot_2</ref> Uncontested elections and seats in California and Washington where two candidates of the same party win the primary are coded as safe seats for that party.</p><p>This measure of swing has several advantages over existing measures of competitiveness that are easier to find. Typically, analysts use a party's seat share or the pop-ular vote to track how close a party is from capturing a majority. But using seat share may mask the heterogeneity in the vote share of pivotal seats. And the popular vote is ambiguous because an electoral system's swing ratio varies by context. The ratio has ranged from 2 to 3 in the modern era <ref type="bibr">(Linzer 2012)</ref>, meaning that a 1 percent increase in the popular vote can translate to roughly a 2 to 3 point increase in the proportion of seats. In this application, we care about the seat swings directly. Moreover, the presence of some uncontested seats and seats contested by two candidates of the same party makes the computation of the popular vote ambiguous. The distinctiveness of elections in the modern era is clear from the historical comparison. In the early 2000s to 2020, control of both chambers could have been reversed</p><p>by less than a percentage point swing against the Democratic Party. One would need to go back to the 1950s to find a time when that small a swing would have been decisive for legislative control. The historical trends highlight two prior turning points where deepening Democratic control reversed -the Gingrich Revolution of 1994 which ushered in a two-decade stretch of near-Republican control of the U.S., and the Republican takeover of the Senate in 1980. <ref type="bibr">Lee (2016)</ref>   Note: Each point represents the smallest uniform swing against the party that eventually won control of the chamber necessary to reverse the majority party. In 2020, for example, an 1 percentage point uniform swing against the Democratic party would have reversed their majorities in both the House and the Senate.</p><p>In 1990, in contrast, it would have required a 9 point swing for the House and a 7 point swing in the Senate to reverse the Democratic majority. * In 1966, the Democratic Senate majority held 67 votes and only 20 were up for election. Therefore, no amount of swing would have been sufficient to vote Democrats out of their majority.</p><p>their party strategy towards distinguishing the party from the Democratic party.</p><p>Combining the findings on the prevalence of the swing voter and the closeness of party control produces Figure <ref type="figure">1</ref>.5. This is the main relationship in the swing voter paradox: elections with fewer swing voters are the closest ones. The relationship is stronger for U.S. House elections, where all seats are up for election each year. Interpretation is more complex for the Senate, where roughly two thirds of the majority party seat share in any given year is actually not reflected in the ticket splitting estimates (which only use the year's 33 or so elections) represented on the horizontal axis.</p><p>The slope coefficient in the U.S. House is 0.68 with a robust standard error of 0.14.</p><p>This implies that every one percentage point decrease in the prevalence of swing vot-  Note: The horizontal axis shows the values used in Figure <ref type="figure">1</ref>.2, and tracks the degree of net ticket splitting from the President that year. The vertical axis shows the share of seats the majority party won in the election. Slope coefficients show coefficients and robust standard errors of the bivariate relationship. The two measures roughly correlate, especially for the House, where all seats are up for re-election. Somewhat paradoxically, the decline of ticket splitting and the increase in party polarization has occurred as any swing voter has become more likely to be pivotal.</p><p>ers is associated with a 0.68 percentage point decrease in the winning party's margin, thus making their party's victory more precarious.</p><p>These statistics of minimum swing provide useful context for subsequent analysis of the swing voters because it establishes in a concrete way how much their vote is electorally important. Swing voters that comprise three percent of the electorate would not be electorally important in the U.S. Senate election of 1966, where the Democratic party majority was so large that no amount of swing could have put them out of their majority. But the same three percent constituency is more decisive in a year like 2020, which could have made the difference between a Democratic trifecta and a Republican trifecta (The tipping point state in the 2020 Presidential election is either Wisconsin or Pennsylvania, where Biden won by less than a percentage point of the two-party vote).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4">Patterns in State and Local Offices</head><p>Almost all of past work on ticket splitting has focused on comparisons of Congressional races with the President. This is not sufficient evidence to attribute that the decline in ticket splitting is due to changes in the voter's preferences towards generic parties. This is why comparison across multiple levels of government is important.</p><p>If the generic swing voter has declined, we would expect ticket splitting and party switching to decline across all offices similarly, because a single voter often votes for multiple partisan offices on the same ballot. On the other hand, if patterns differ by office even in the same state and same election, this suggests declines in ticket splitting may be particular to the office, and perhaps shaped by the ideological and valence candidates that run in those elections.</p><p>Data limitations have prevented even simple comparisons, however, below the statewide level of Governor. In this section, I walk through each set of offices outlined in Table <ref type="table">1</ref>.1. I retain the Presidential race as the reference category for the vote share difference.</p><p>Moving to state and local offices changes the interpretation of ticket splitting in one important respect, because these offices do not deliberate on the same legislation with the President. Because Congress and the President negotiate over the same legislation, it may be the voters prefer to balance their vote, ensuring that no party gains a trifecta <ref type="bibr">(Fiorina 1992</ref>). Concerns about weak, unaccountable party government (American Political Science Association 1950) are also largely predicated on the common legislative setting. These concerns are not at play, for example, when a voter is voting for a Governor and a President. However, it is precisely because of this lack of a legislative connection that makes nationalization and party polarization of particular concern <ref type="bibr">(Hopkins 2018)</ref>. If voters cast their ballot for a single party regard-less of the particularities of each level of government, this is one (but certainly not a sufficient) indication that partisan voting occurs without considerations of important qualifications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Governors and Statewide Executives</head><p>Figure <ref type="figure">1</ref>.6 starts by comparing the offices of statewide offices. Past research shows that compared to Senators, there are a few more Governors who come from opposite parties as their Presidential candidates <ref type="bibr">(Sievert and McKee 2018)</ref>. The data in this project allows a broad comparison of all executive offices. Some care is necessary to provide a informative comparison of the rates found here with that of Congress, because election cycles vary. Statewide executive elections occur every four years in almost all states. About two thirds of these states hold elections in midterm years, about a third in Presidential years, and two states, New Jersey and Virginia, in odd years. Therefore, the difference between the year-by-year averages of Governor and Congress in Figure <ref type="figure">1</ref>.2 may be due to the comparison of different states. To avoid this confounding, for each year I subset the Congressional statistics to the states which hold a Governor (or other statewide executive) election that year. Sometimes there is no Senate election in that year, so I use the U.S. House values as well. I generate a Congressional average baseline by upweighting the U.S.</p><p>Senate values with the number of Congressional Districts the state has that year as a rough measure of population.</p><p>Figure <ref type="figure">1</ref>.6 shows that statewide executive offices follow the same trend as Congress, but the office of Governor is more resilient in the modern era to the trend. In 2016, the average Democratic vote share in a Governor race (weighted roughly by a state's population size) differed from the state's Presidential election by 6 percentage points, while a Congressional race in those same states was about 3.5 percentage points. In 2018, with a different set of states with Governor elections in midterm years, the rates were 4.5 percentage points and 4 percentage points, respectively. This may in part be due to the fact that Governors in most states are elected in midterm years, therefore facing different electorates. Governors may also be able to free themselves from national party discipline as executives of their own states, dealing with a different set of issues.</p><p>The remaining statewide executive offices shown in Figure <ref type="figure">1</ref>.6 include races for Attorneys General, Secretaries of State, Auditors, and State Treasurer. In the "Lottery of the Long Ballot," <ref type="bibr">Key (1963)</ref> expressed concern that the separate elections of these offices would lead to state executive cabinets of different party members, and dampen accountability. At least in the modern era, these offices tend to align even closely with the national party lean of the state.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>State Legislative Offices</head><p>Ticket splitting in the average state legislative election appears to have undergone a similar, if slightly modest, decline. Figure <ref type="figure">1</ref>.7 shows the same average statistic in the years available by data provided by Steven Rogers.</p><p>Prior work on state legislative elections would lead us to predict that state legislative elections in the modern era should move in similar ways as the national offices. <ref type="bibr">Rogers (2016)</ref> shows how partisan outcomes of the state house rise and fall along with the electoral results of the U.S. House, and that the electoral fortune of state legislators are shaped by in approval of the President and his party. On the other hand, other work focuses on how some state legislators may not be completely aligned with their party's positions in Congress, perhaps to adapt to their district's preferences <ref type="bibr">(Shor and McCarty 2011;</ref><ref type="bibr">Erikson, Wright, and McIver 1993;</ref><ref type="bibr">Polborn and Snyder 2017)</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>County Offices</head><p>The decline of ticket splitting is not nearly as apparent in local offices such as county councils and county sheriff offices, at least with the set of elections analyzed here. For our last set of results, Figure <ref type="figure">1</ref>.8 shows differences for county council and county sheriff elections in Presidential years.</p><p>Unlike the prior analyses which covered most states, these analyze only a hundred or so counties each year and so are not readily comparable with the figures with other federal and state offices. A better comparison is to hold the particular set of counties constant by recomputing differences in congressional elections with the same set of counties. The dotted lines in Figure <ref type="figure">1</ref>.8 show the average difference between the Presidential vote share and the U.S. House vote share in the same set of counties used in Data: <ref type="bibr">Klarner (2021)</ref> and <ref type="bibr">Rogers (2021)</ref>, with Presidential voteshares in the state legislative districts originating from <ref type="bibr">Rogers (2000)</ref>, <ref type="bibr">NCEC (2004</ref><ref type="bibr">NCEC ( , 2008))</ref>, and Daily <ref type="bibr">Kos (2012</ref><ref type="bibr">Kos ( , 2016))</ref>.</p><p>the county office average for that year. a trend may be due to the particular set of counties we observe here. That said, the data covered here represent one of the best opportunities to date in examining this question. It opens the possibility that while voters are less likely to split their ticket in congressional races, the same partisan voter is willing to split their ticket in local offices on the same ballot, a proposition I test more thoroughly in Chapter 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.5">Regional Differences</head><p>Regional and state-specific differences in these trends may also be an important factor, in addition to office, in explaining these trends. <ref type="bibr">Mayhew (1986)</ref> showed vary-ing levels of strength in party organization and machine politics across states. Strong party states may have been able to run and elect a party slate that differed in ideology from the party platforms in the Presidential elections. Regions such as the former Confederate South may also have been unique in resisting the convergence of state party platforms to national ones.</p><p>We can investigate the extent to which state and regional differences explain variation in ticket splitting through the analysis of statewide elections, where we have a long time series of over 70 years. Figure <ref type="figure">1</ref>.9 shows the trends of each state separately.</p><p>There are typically one to five statewide elections in each state and each general election. To reduce the variance from idiosyncratic races, I therefore take the simple average across statewide offices and across years within a decade. Each state -full decade observation, then, contains an average of 14 elections.</p><p>There are no clear regional differences in the statewide trends of the difference from the Presidential vote. Figure <ref type="figure">1</ref>.9 shows the trends of all 50 states in light gray over the course of these decades, show trends that are roughly in line with the aggregate trend.</p><p>There are only two states where trends in divergence have bucked the rest of the country: West Virginia and Maine. West Virginia started out as having a close correspondence between statewide and Presidential vote, but gradually diverged, with Democrats winning statewide in 2010 but with the same voters voting for Republican presidential candidates. The same pattern can be said for Maine, although independent candidates (which are excluded in the voteshare calculation) complicate this picture. Overall, almost all states appear to have experienced a trend for convergence to the Presidential vote, a story consistent with theories of nationalization.</p><p>The South in particular is a region that stands out in Figure <ref type="figure">1</ref>.9 as well as in the scholarship on realignment <ref type="bibr">(Hopkins 2018;</ref><ref type="bibr">Aldrich 2000)</ref>. In Figure <ref type="figure">1</ref>.10 I show more clearly the difference between the South and non-South with the same statewide data.</p><p>Here I group the differences by every pair of elections that use the same Presidential election (for example, 2016 and 2018), and compute differences separately by office.</p><p>I compute the difference by regressing an election's average difference on a binary variable separating Southern vs. non-Southern states and show the regression coefficient on the binary variable. I use two operationalizations of the South: one with core Southern states including those that so diverged from the national Democratic party that they ran third party Presidential candidates, and a more expansive definition including more former Confederate states listed in the figure.</p><p>The differences reinforce how the South drove the aggregate trends before the 1980s, but this is no longer the case in the modern era. As we saw with Mississippi and South Carolina, by the 2000s Southern states tended to have lower divergence than non-Southern states across all statewide offices. In other words, there appears to be more party loyalty in statewide Congressional and executive elections in the South than in the non-South.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.6">Limitations</head><p>There are at least three limitations to the evidence I have provided so far in making inferences about the prevalence of swing voters, ticket splitters, and party switchers. First, electoral data at the local data is far from complete. Electoral data before 2000 is still rare, and only recently have scholars pursued the collection of state and local election data even in the modern era. Continued data collection, including through new methods for scraping election results from historical newspapers, is A final class of limitations to the analysis is that the statistic used here, the absolute difference between two vote shares, is unsatisfying because it masks the true amount of ticket splitting among the electorate. Fortunately, this class of limitations is one that I overcome in the subsequent chapters. In Chapter 2 I use recent survey data that not only allows me to avoid the ecological inference problem, but also allows me to avoid the assumption of uniform swing. In Chapter 3 I use an untapped source of electoral data -cast vote records -which avoid the ecological inference problem and allows me to measure vote choice without error. Both findings advance the understanding of the electoral power of the swing voter as well as add more detail to the findings in this chapter, which instead provided relatively thin but historically and geographically wide-reaching coverage of the elected offices in the U.S. federal system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.7">Conclusion</head><p>In The main implication of this chapter is to establish this new equilibrium, which hews closely to the idealized notion of two-party government but is a break from most of the last past century. In this politics, fewer voters appear persuadable, but electoral results vary widely. This electoral landscape has several implications for the strategy of campaigns that seek to gain power, although many are not as clear-cut as they may appear at first. The dearth of swing voters might naturally imply that campaigns should focus instead of mobilizing their base to turn out. In a 50-50 election, every voter matters. But if the two parties have starkly polarized, partisans are less persuadable than the sliver of swing voters. It is also the case that persuading a swing voter to switch parties is twice as efficient for the electoral margin as persuading a core supporter to turn out to vote.</p><p>One reason it may not be obvious to characterize this regime as a two-party system is because, contrary to theories of spatial convergence in two-party elections, parties do not appear to have converged their platforms to the median voter. Recent research suggests that this is not a coincidence: minority parties have an incentive to distinguish themselves from the ruling party by exaggerating their differences, especially when their power is equally balanced <ref type="bibr">(Lee 2016</ref>). These dynamics may in fact paint a normatively troubling picture of unproductive governance <ref type="bibr">(Kustov et al. 2021)</ref>,</p><p>or what Drutman (2020) calls a "Two-Party Doom Loop" in which two polarized parties engage in a protracted back-and-forth of control without convergence. Using a large survey dataset with samples in every Congressional District allows me to test two hypotheses from the previous chapters: that the swing vote can be explained by a spatial voting model with valence, that the degree of swing voters in each district is often large enough to flip the Congressional election results. I find that Independents, White Voters, and voters who do not frequently follow the news are more likely to be swing voters. In a panel regression of congressional districts, I also find that candidates who become incumbents net more votes through increases in ticket splitting from out-partisans. The district-specific levels of ticket splitters were enough in 2018 to reverse party control.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Introduction</head><p>The first chapter of this dissertation showed a decline in the aggregate rate of ticket splitting from around the 1980. It also made the case that, even as the proportion of swing voters were declining, a switch in their vote choice was still likely to be pivotal, if note more so. By using a simple difference in aggregates for the measure of ticket splitters and using the assumption of uniform swing to measure pivotality, however, the chapter still did not definitively answer the seemingly straightforward question of how many swing voters there are in each legislative district.</p><p>How common are swing voters, why do they swing, and when are swing voters decisive? This chapter provides systematic answer to these questions in the U.S House.</p><p>Conventional wisdom suggests swing voters are thought to have all but disappeared.</p><p>My measurement approach does confirm that swing voters are a small portion of the electorate, about 3 to 5 percent. This level is within the margin of error in a typical Hall 2015). Candidate visibility is in fact one of the key predictors of ticket splitting found in early work <ref type="bibr">(Burden and Kimball 1998;</ref><ref type="bibr">Beck et al. 1992</ref>). Hall and Thompson ( <ref type="formula">2018</ref>) is one exception, which tests the effect of US House candidate extremism on differential turnout at the congressional district level. However, the authors do not make small area estimation adjustments to their analysis of survey outcomes.</p><p>After reporting descriptive statistics from the measurement strategy, I conduct three sets of analyses to understand the drivers of the swing vote in the modern era.</p><p>To estimate the causal effect of candidate quality -a key parameter in theoretic models -I employ a panel regression. Together, the evidence identifies a relatively small subset of the population of swing voters who respond to candidate quality, and may, especially in districts that are largely balanced in partisan strength, be decisive.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Models and Methods</head><p>The formalization of the concept of the swing voter in the Introduction provides justification for why ticket splitters and party switchers are reasonable approximations for the swing voter. We cannot measure latent preferences for a generic set of party candidates with survey or behavioral data, but voters who are indifferent are likely to split their ticket (within a single ballot) or switch their party (across two ballots). The spatial model also clarified how candidate level characteristics and move the cutpoint at which a voter splits their ticket. Therefore, an ideal setting to study this individual choice is a setting with sufficient survey data where there is large variation of candidates across districts and across time. U.S. House elections which are held every two years provide such a setting. The CCES is a desirable dataset for several measurement reasons beyond its sample size. First, it includes indicators for the congressional district (CD) the voter is registered in, which is crucial for creating CD-level estimates. Second, its survey questions on the House vote present the full name of the specific US House candidate with their party affiliation, and measure vote choice before and after the election in two waves, which makes for a more reliable measure of House vote compared to a generic question. Third, the CCES validates each respondent's turnout by matching the personal information of each respondent to state voterfiles. Because about 20-30 percent of survey respondents misreport their turnout (Ansolabehere and Hersh 2012), I limit all my survey analysis to respondents whose turnout was validated. Of course, the measure of vote choice is still by self-report, and contains some degree of measurement error.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Survey Data and MRP Estimates</head><p>Even surveys as large as the CCES are prone to selection bias at smaller geographies, so I adjust CD-level estimates by Multilevel Regression Post-stratification (MRP)</p><p>to reduce the mean square error of district specific samples <ref type="bibr">(Warshaw and Rodden 2012)</ref>. The CCES sample contains about 60 to 120 voting respondents for each congressional district each year. Both bias and variance are issues. The particular sample from the district may not be representative of the district. As I elaborate in Chapter 4, the CCES is not designed to be representative of particular districts and the weights only weight to state. And even if it were a conditionally unbiased sample, the estimator suffers from large variance. Some district samples, in fact, contain zero</p><p>Trump to Democrat vote switchers. This is not surprising given that the expected size of this population is about 2.5 percent, but it is not plausible that there were actually zero voters in that entire district who were switchers.</p><p>The MRP specification used here is a standard one, with a notable addition being that I stratify on the incumbency status of each district's candidate on top of standard demographic predictors. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Predictive Variables</head><p>Before estimating poststratified district estimates, we can first examine the es- where swing is whether or not the CCES respondent splits their ticket or switches their vote, newsint is a common CCES variable that asks voters how often they follow the news, and pid7 is a seven-part partisan self-identification variable. In this identification, voters are first asked if they identify as a Democrat, Republican, or Independent. Democrats and Republicans are further given the option of a "strong" or "not very strong" partisan, and Independents are further given the option of leaning Republican, leaning Democrat, or lean for no party.   These patterns highlight four main predictors of being a swing voter. First, respondents who do not identify strongly with either party are more likely to swing, consistent with the spatial model. Second, whites are more likely to be swing voters compared to other racial groups, while Blacks are consistently straight ticket voters. This is consistent with the established finding that Blacks are more likely to be steadfast Democrats. Black voters who vote for Republican Presidential candidate are more likely to split their ticket for a Democratic House candidate, but these voters are relatively rare. Third, those who do not follow the news most of the time are more likely to be swing voters. Each of these differences by race and news interest constitutes a 2-5 percentage point difference, which is large enough to be decisive.</p><p>While the estimates provided here do control for variables that scholars have found to be relevant in ticket splitting, the estimates should not be taken as causal. For example, it could be that low interest in Congressional politics is correlated both with identifying as an independent, not having a college degree, and the propensity to switch parties. I focus on ruling out omitted confounders in the subsequent analyses focusing on candidate quality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">The Prevalence and Geographic Distribution of Swing Voters</head><p>To provide estimates for each Congressional district in each election of the prevalence of ticket splitters or party switchers requires additional modeling adjustments.</p><p>Because there can often be only about 60 respondents in a given Congressional district and the national rate of ticket splitting can be in the low single digits, there is a decent probability that no ticket splitters are sampled into a Congressional district subset by chance alone. In such cases, no amount of weighting will move the point estimate of ticket splitting in that district from 0, even though there is likely a small per- centage of ticket splitters in the population. To avoid this problem, survey researchers have used partial pooling regressions such as random effects to smooth out small sample idiosyncracies <ref type="bibr">(Gelman and Little 1997;</ref><ref type="bibr">Ghitza and Gelman 2013)</ref>. I estimate an outcome model similar to the one in the previous section, but I remove variables which cannot be post-stratified (such as news interest) and add district level variables such as past Presidential voteshare and the incumbency status of the House candidate that help partial pooling. I investigate the variability in these Congressional district level MRP models in Chapter 2.</p><p>Descriptive statistics in Table <ref type="table">2</ref>.1 show that about 6 percent of a general election electorate are swing voters under my definition: 3 percent who both vote for a Republican President and a Democratic House candidate, and 3 percent who vote for a Democratic President and Republican House candidate. For the average district, MRP estimates that about 10 percent of the consecutive electorate are swing voters. These numbers average across years, but each year typically has a national swing. In 2018, the national tide advantaged Democrats: 1.4 percent were Clinton voters who then switched to a Republican House candidate, and 2.6 percent were Trump voters who voted for a Democrat. Analysis of the turnout and vote by <ref type="bibr">Ghitza (2019)</ref> suggest that in 2018, this vote switching was the critical piece that best explained the change in seat control. In 2014, more Obama voters defected to Republican House candidates than did Romney voters to Democratic House candidates. In addition, <ref type="bibr">Ghitza (2019)</ref> identifies 2014 as an election year where turnout differentials also made a difference.</p><p>The turnout differentials also advantaged Republicans, with more 2012 Obama voters not turning out to vote compared to 2012 Romney voters.</p><p>Each year's geographic distribution of swing voters is shown in Figure <ref type="figure">2</ref>.2. These choropleth maps show both geographic and year-to-year variation in our main outcome measure of interest. In a given year, the range of swing voters is rather limited.</p><p>For example, all districts had less than 5 percent of vote switching to Republicans in 2018. Table <ref type="table">2</ref>.2 lists the Members of Congress who represent the districts with the highest amount of swing in 2018. Some members of this list are expected -they include well-known moderates such as Dan Lipinski (IL-09, more conservative than 85%</p><p>of his Democratic colleagues in the House by NOMINATE), Collin Peterson (MN-07, the third most conservative Democratic member), and Paul Cook (CA-08, more liberal than 80% of his Republican colleagues). However, some of the other members who draw large proportions of ticket splitters are at the extreme ends of their party.</p><p>The map, combined with the standard deviations presented in Table <ref type="table">2</ref>.1, shows that the cross-district variation in swing voters is not as high as one might think. In Table <ref type="table">2</ref>.1 the standard deviation in the district-level proportion of swing voters is about 0.03 for all three proxies, whereas the standard deviation for two-party vote share is five times larger. Although political observers classify districts and states as swing vs. safe districts, the difference between those types of districts is only on the  order of a percentage point or two.</p><p>The range of values estimated in Figure <ref type="figure">2</ref>.2 may also strike most observers as exceedingly small. They are indeed an order of magnitude smaller than Key's estimates from the 1940s and 1950s, although Key compared switching between Presidential elections 4 years apart. My estimates are in line with the estimates by multiple other sources -for example, Jacobson (2019) finds similar numbers using a collection of polls. That the typical rate of individual-level swing is on the order of 5 to 10 percent is consistent with the narrative of increasing party loyalty and nationalization.</p><p>How do we square this small number with sizable seat swings -that for example in 2018, when we see the lowest level of individual-swing in the MRP estimates, the most number of congressional districts changed party control? A common explanation is differential turnout. 2018 saw a historic surge in turnout, so it could be that these new voters gave Democrats their critical votes, while 2016-2018 voters actually did not change their preference. This explanation, however, is not supported by the data in 2018. <ref type="bibr">Ghitza (2019)</ref> estimates that about 90 percent of the total vote margin in 2018 was due to persuasion, not turnout.<ref type="foot">foot_3</ref> They estimate that the gain in vote margin of drop-off and surge voters effectively offset each other, at least nationally. Historically, midterms have a pro-Republican turnout bias because Democratic voters stay home.</p><p>In 2018, the midterm surge had a pro-Democratic lean, and so the turnout effect of each group cancelled out in the final vote margin. In contrast, their analysis finds that differential turnout was likely a decisive factor in 2014.</p><p>I argue instead that the association between individual-level swing and seat-level swing depends on whether swing voters in each district are pivotal. When a district is lopsided with partisan straight-ticket voters of a particular party, even a large bloc of swing voters that comprise 20 percent of the electorate may not be enough. But in a battleground where each party holds 47 percent of the electorate, even a small group that comprises 2 percent of the electorate is decisive. For example in 2016, Donald</p><p>Trump won Michigan with a vote margin of two-tenths of a percent, won Wisconsin with five-tenths of a percent, and Pennsylvania with seven-tenths of a percent, and consequently won the electoral college. In Chapter 1, I showed that the control over Congress was similarly close in the modern era. The Presidency, the U.S. House, and U.S. Senate have slightly different dynamics -the U.S. House has more districts so its control hinges less on a single district, and the U.S. Senate only puts a third of its members for election -but each has had close control change rapidly.</p><p>In Section 2.6 I quantify pivotality by taking the observed vote share and computing a simple hypothetical vote share had swing voters switched backed their vote as a bloc. Under this definition, swing voters were probably decisive in 125 out of the 435 congressional districts, especially in suburban ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">The Role of Candidate Incumbency</head><p>In a given year, the estimated prevalence of swing voters in US House districts ap- I estimate the contribution of incumbency in ticket splitting by an fixed effects approach. While the predictive model in the previous section suggest that the fact that a candidate is an incumbent leads to more ticket splitting, it may be the case that unmeasured confounding between the type of voters in a district and the tendency for incumbents to stay in office in that district biases the estimates. One way to account for this possible confounding is to group data at the constituency level (or district, indexed by c) and year level (indexed by t), and estimate the following two-way fixed effects model:</p><p>where S l ct is the percentage of voters in the district that split their ticket for candidate l as measured by MRP, I l ct is a binary treatment variable for whether the Democratic (l) candidate is an incumbent, and &#947; c and &#947; t are constituency and time fixed effects.</p><p>The least squares estimator for &#946; identifies the one-shot effect of the Democratic candidate becoming an incumbent on splitting, controlling for time-invariant characteristics of the district.</p><p>The two-way fixed effects estimator is equivalent to a weighted average of differencein-differences estimators that each estimate the Average Treatment Effect on the Treated (ATT) from a matched set of control units, but where each match is not optimal (Imai and Kim 2020). To improve the quality of the matches, I also estimate a matched difference-in-difference estimator that matches pre-trends based on covariates <ref type="bibr">(Imai and Kim 2019)</ref>.</p><p>The current dataset covers all contested congressional districts each observed in four cycles <ref type="bibr">(2012)</ref><ref type="bibr">(2013)</ref><ref type="bibr">(2014)</ref><ref type="bibr">(2015)</ref><ref type="bibr">(2016)</ref><ref type="bibr">(2017)</ref><ref type="bibr">(2018)</ref>. When a district is uncontested, the observation is set as missing. When estimating each treated unit's match, I also include the missingness patterns as a matching criterion, as well as lagged values for presidential vote share and lagged values for the treatment indicator.</p><p>Table <ref type="table">2</ref>.3 shows the main coefficient of the panel regressions. Across specifications, being a House incumbent is associated with about a 2 percentage point increase in the percent of ticket splitting. This estimate is on the same order of magnitude as the predictive models estimated in the previous section. The columns labelled with 2FE</p><p>show the two-way fixed effects estimate, and columns with PM show the PanelMatch method. Using sets matched on pre-treatment covariates generate similar sized estimates, except when three pre-treatment periods are used to generate matches. This is not surprising because there are only four time periods in the data and so the effective sample size for such a match is small.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6">Hypothetical Vote Margins</head><p>The district estimates of the proportion of a President's supporters who vote for the opposing part in the U.S. House is a useful representation because it can also be compared on the same units as actual election outcomes. This allows us to consider some hypothetical scenarios for election outcomes if vote switchers had not switched their vote. By providing district specific estimates, moreover, I avoid the uniform swing assumption used in Chapter 1.</p><p>As a concrete example, consider a congressional district where the Democrat nar-rowly won by 4 percentage points, i.e. the two party voteshare was 52 to 48. Suppose that 3 percent of those who voted were crossover voters in favor of the eventual winner (the Democrat), and 1 percent of those who voted were crossover voters in favor of the eventual loser (the Republican). Now consider the hypothetical where friendly crossover voters had not crossed over and stayed with their previous vote for the Republican party. Shifting over the mass to the other side, the two party voteshare is now 49 to 51 and the Democrat loses by 2 points. Next suppose that the unfriendly crossover voters did not switch as well. Then the margin is 50 to 50 and the Democrat ties. It is common knowledge in political campaigns that persuasion is worth "twice" more than mobilization because changes in vote choice will have double the effect in margin.</p><p>In Figure <ref type="figure">2</ref>.3 I align pairwise comparisons of the actual win margin and hypothetical margin of each of the 390 contested 2018 House districts. Each comparison is depicted with an arrow. The starting point of the arrow is the win margin observed from the election result, and the end point of the arrow is the hypothetical win margin under two conditions. In other words, let M 0i &#8712; [0 + , 1] be the observed win margin for district i. Then, using the survey data, we attempt to estimate the proportion &#968; i &#8712; [0, 1] of the friendly crossover voters in district i. Separately, we estimate the mass of "unfriendly" crossover voters are of mass &#968;i . Then I define the hypothetical margin, which I denote M 1i as the margin, still for the same candidate, as:</p><p>when the counterfactual is that only friendly cross-over voters swing back, and</p><p>for the counterfactual that both types of crossover voters switch back to their 2016 vote. Figure <ref type="figure">2</ref>.3 shows the same value of M 0i but draws the line to either M 1i or M 1i .</p><p>Both hypotheticals are important in their own ways. The first indicates the worst case scenario if the winner's coalition asymmetrically defected. The second indicates a partisan polarized case where no one defects. However, because elections tend to have roughly uniform swing that advantages one party over the other, it may be less plausible to imagine a symmetrical decline in both types of switches either.</p><p>The cross-over voter is pivotal when the arrow ends up reaches at our beyond 0. In The first key implication from this first hypothetical is that although 1-3 percent of vote swing may at first seem small, it can still more than explain the consequential seat swings in the Congress as a whole (of 30 flips) here. The second implication is that whether or not swing voters are consequential depends not only their size in the population but also their spatial position in the district. For example, in suburban districts, win margins were sufficiently low to begin with -i.e. the districts were probably more competitive -that small amounts of cross-over voters were more likely to be decisive in those districts.</p><p>The second panel in Figure <ref type="figure">2</ref>.3 shows a similar, if more moderate, picture. Because friendly and unfriendly crossover voters cancel out, the lengths of the arrows are always smaller than the top figure, and can go in different directions. Interestingly, Democrat candidates across the board tended to win thanks to vote switchers, whereas winners of Republican candidates tended to win despite vote switchers.</p><p>These figures use &#968; i as fixed but there is of course uncertainty around these estimates. To account for this, I approximate confidence intervals around my estimate in the following way. I take the standard error around &#968; i implied by the MRP credible intervals, and then multiply it by &#8730; 2 and then by the Normal cumulative density &#934;(0.9) to estimate the 80 percent (as is standard for Bayesian models) confidence interval around 2 &#968; i . I take the observed win margin as constant. In Figure <ref type="figure">2</ref>.4, I drop the observed margin and simply plot M 1i with those confidence intervals. If the confidence intervals cross 0, that means I fail to reject the null hypothesis that M 1i &lt; 0 with &#945; = 0.20, implying (loosely) that swing voters were likely in that district. I naturally get larger estimates of pivotality based on these estimates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.7">Conclusion</head><p>A systematic examination of survey data which are then translated into quantities directly comparable with district voteshares leads to a richer picture of how the swing voter fits into nationalized electoral politics. The first set of predictive models shows that ticket splitters tend to have low news interest and have low educational attainment, but are distributed across the urban-rural divide. Through a set of panel regressions, I also showed that there is a candidate component as well as a voter component to explain whether or not someone is a swing voter. These findings accord with the spatial voting model that I used initially to justify the operationalization of the swing voter in survey data.</p><p>There are two main areas for methodological improvement in this approach. First, more covariates can be accounted for in the post-stratification and pooling to stabilize estimates. I show in Chapter 4 that modeling a synthetic population I create a database from voting machines that reveals the vote choices of 6.6 million voters for all offices on the long ballot, and I design a clustering algorithm tailored to such ballot data. In contrast to ticket splitting rates of 5 to 7 percent in U.S.</p><p>House races, about 15 to 20 percent of voters split their ticket in a modal Sheriff race. Even in a nationalized politics, a fraction of voters still cross party lines to vote for the more experienced candidate in state and local elections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Introduction</head><p>The nationalization of voter behavior in recent decades is thought to have shifted the electoral landscape, changing the conventional wisdom about the U.S. electorate.</p><p>More people vote for the same party's candidates in races for President and the U.S. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Ticket Splitting in a Nationalized Era</head><p>In </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Limitations to Existing Studies</head><p>To find how often individual voters split their ticket and why, existing studies almost exclusively rely on two types of data: Aggregated election returns and survey samples measuring self-reported vote choice. The use of election returns dates at least back to V.O. Key's chapter in his treatise of state politics, "The Lottery of the Long Ballot" <ref type="bibr">(Key 1963, ch.7</ref>) and continues to be used in recent work <ref type="bibr">(Trounstine 2018)</ref>.</p><p>While comparisons of election returns in different districts can provide a sense of the directionality of an "incumbency effect," for example, they do not reveal individual voting patterns and can severely underestimate the prevalence of ticket splitting. As I showed in Figure <ref type="figure">1</ref>, the difference in voteshares is often a lower bound for the total number of ticket splitters that exist. In the past few decades, social scientists have developed ecological inference methods to estimate the actual amount of ticket splitting from office-level aggregate data. These methods estimate the joint voting probabilities that best comports with aggregated returns subject to modeling assumptions <ref type="bibr">(King 1997;</ref><ref type="bibr">Wakefield 2004;</ref><ref type="bibr">Greiner and Quinn 2009)</ref>. <ref type="bibr">Burden and Kimball (2002)</ref>, for example, were one of the first to apply ecological inference techniques to election returns and estimate the degree of ticket splitting between votes for U.S. House and President.</p><p>As developed as these methods are, they are inherently model-based inferences. Their output may be biased and the opportunity to quantify the direction of that bias is rare.</p><p>Surveys circumvent the aggregation problem by sampling enough individuals and prompting them to self-report their vote choice. In this paper I propose to analyze cast vote records, a dataset that is free of these measurement problems. Before describing the data, however, I first outline possible explanations for why voters would split their ticket.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Potential Explanations for Why Voters Split their Ticket</head><p>Analysts typically interpret the lack of ticket splitting as a measure of the nationalization and polarization of voter preferences, but the literature on ticket splitting has long shown that voters split their ticket for many other reasons. I will ultimately focus on a valence advantage explanation, which provides some of the clearest theoretical predictions.</p><p>A spatial voting framework bears out the logic of existing explanations. In a canonical spatial model, citizens choose the candidate whose policy position is closest to them on an ideological spectrum. Voting on the long ballot is akin to a citizen in these models making a series of choices between candidates of two parties. If all Republican candidates for these choices held identical policy positions and all Democratic candidates held another set of identical positions, a voter would cast a straight ticket vote.</p><p>This setting mimics a completely nationalized politics: co-partisan candidates for local and national office run on identical platforms and voters vote accordingly for a party slate.</p><p>There are at least three classes of explanations for why a voter might split their ticket. First, voters could cross party lines in races where the candidates are less polarized. This is the simplest explanation because we still presume a single issue dimension. A second explanation for ticket splitting posits that state and local politics is contested over different issue dimensions. For example, <ref type="bibr">Oliver, Ha, and Callen (2012)</ref> document how local politics revolves around contestations over land use, economic development, and other issues specific to that locality. And recent survey evidence</p><p>shows that voter's preferences over those policy debates often do not align with their partisanship <ref type="bibr">(Jensen et al. 2019</ref>). If local elections feature candidates with differing views on the environment, for example, while candidates for national office all take the same position on the environment, environmentally conscious voters would vote for the same party within national offices but then defect from that party allegiance in local races to vote for the pro-environment candidate <ref type="bibr">(Besley and Coate 2008)</ref>.</p><p>The third reason voters may split their ticket, the one I focus on the most in this paper, is that voters care about valence and one candidate has a valence advantage.</p><p>Valence is an attribute that all voters prefer more of to less, such as candidate compentence, effort, or experience for the job. The main insight from spatial models with valence is that ticket splitting for a candidate would be increasing in that candidates' valence advantage relative to the candidate's spatial distance. I show these results for-mally in the Introduction of this dissertation (equation 3). Furthermore in this setup, more certainty around a candidate's policy position effectively constitutes as a valence advantage as well. In this sense, candiate visibility, or salience, is captured within the concept of valence as well. profile offices also lends support for the model that the higher quality candidate nets more by voters splitting their ticket. <ref type="bibr">Beck et al. (1992)</ref> showed that highly visible candidates draw more split ticket voting, and <ref type="bibr">Burden and Kimball (2002)</ref> showed that congressional candidates with larger campaign expenditures appear to compel more voters to split their ticket.</p><p>Information is crucial in these valence-based accounts. A candidate's valence advantage cannot factor into voter's decisions unless that information reaches voters before they vote, for example through campaigns and press coverage. In all but three U.S. states, ballots do not contain anything other than the candidate's name and party.<ref type="foot">foot_7</ref> </p><p>That is why theories of information processing may lead one to predict more straight ticket voting in low-salience elections <ref type="bibr">(Darr, Hitt, and Dunaway 2018;</ref><ref type="bibr">Moskowitz 2020)</ref> . <ref type="bibr">Peterson (2017)</ref> showed systematically that the lack of candidate-specific information increases the likelihood that voters vote straight ticket</p><p>In this paper, I primarily test the valence hypothesis for theoretical clarity, as well as to show how ticket splitting could occur even when candidates have, as conventional wisdom goes, nationalized. Of course, multiple issue dimensions may be at work as well. But spatial models with multiple issue dimensions are notoriously intractable, so their theoretical predictions are less clear. And while moderation may be a factor, valence is an explanation that is theoretically plausible even if all candidates of the same party are polarized, as an account of nationalization would stipulate.<ref type="foot">foot_8</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Data, Methods, and Case</head><p>Despite the centrality of straight ticket voting to the discussion of nationalization and the incumbency advantage, past work has struggled to measure this individual level behavior in the offices that stand to change the most drastically if it were to nationalize. What is needed is an approach that drills down each individual's long ballot, observing the entire series of a voter's choices. Fortunately, cast vote records do precisely that.<ref type="foot">foot_9</ref> I first show how these records compare to traditional data sources. I then introduce the cast vote records in South Carolina as the main dataset in this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1">Cast Vote Records</head><p>Cast vote records are complete readouts from voting machines. Ballots in the U.S.  + error Covers all offices on the ballot? Personally identifiable information? Imperfect Contains precinct identifier?</p><p>Note: Each column lists the properties of a type of major election dataset.</p><p>this paper also use cast vote records to describe voting patterns at a level of granularity that surveys could not achieve. <ref type="bibr">Gerber and Lewis (2004)</ref> standardized ballot images from Los Angeles county in the 1992 general election and estimated voter's ideal points from their choices in statewide ballot referendums. Later <ref type="bibr">Herron and Lewis (2007)</ref> standardized ballot images from ten Florida counties from the 2000 presidential election to estimate the partisanship of Ralph Nader voters based on their votes in down ballot partisan contests. Both studies use cast vote records primarily to estimate latent preferences, while this paper describes and models the full set of votes that would underly such a summary measure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2">Processing South Carolina Cast Vote Records</head><p>In this study I use records from 6.6 million voters across five general elections in the same state, the largest collection of cast vote records to date. Counties vary widely in how they administer elections but South Carolina offers a rare opportunity to study the long ballot because it runs a centralized and transparent election administration   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3">State and Local Elections and Politics in South Carolina</head><p>South Carolina is comparable to other states in the number of elected offices on each voter's general election ballot. The state has 7 congressional districts, 124 state house districts, 46 state senate districts, and 46 counties each with a county council often elected through single member districts. Statewide, Attorneys General, Secretaries of State, agricultural commissioners, and superintendents are elected in conjunction with the Governor's race in midterm years. Countywide elections include the partisan offices of sheriff, county clerk, treasurer, and probate court judges.</p><p>Existing research suggests that state legislatures, which in South Carolina deliberate on issues including education spending, environmental regulation, and abortion, to be as polarized as Congress.<ref type="foot">foot_11</ref> However, other offices tend to focus on administrative matters <ref type="bibr">(Oliver, Ha, and Callen 2012)</ref>. County councils are legislative bodies that often discuss transportation infrastructure, public facilities, and sales taxes. Sheriffs are the chief law enforcement officer and manage county jails, auditors calculate millage rates, treasurers collects taxes and oversees the disbursement to other jurisdictions, the clerk of court manages court dockets and manages the collection of fines and fees, and coroners perform independent investigations of deaths. In the judicial branch, circuit solicitors (known as district attorneys in other states) serve as the chief prosecutor of state government, and probate court judges have jurisdiction over civil cases such as estate inheritance. Despite their administrative functions, all of these offices are directly elected through partisan elections in general elections. Almost all candidates register for the Republican or Democratic party and win a party primary to be elected.</p><p>While South Carolina is a solidly Republican state in national elections, Democrats win seats at considerable rates on the long ballot. For example, just over a half of all countywide executive offices elected on a partisan ballot in 2016-2018 were Democrats The available survey evidence suggests that straight ticket voting in South Carolina is comparable to the national average. Among the 4,512 respondents in the CCES between 2010 and 2018 voting for major party candidates in the state, 93 percent voted for the same party between the Presidency and the U.S. House, 91 percent between the U.S. Senate and the Governor, and 92 percent between the U.S. House and Governor. All three numbers are within one percentage of their respective national average (n = 318,346).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.4">Additional Candidate Attributes</head><p>After measuring the prevalence of ticket splitting across the long ballot, I then combine this information with information about the candidates in each race. Through web campaign filing reports and old versions of county websites, I mark the incumbency status of each candidate in my dataset. Other than incumbency, systematic information about both winning and losing candidates in local elections is sparse. I collect additional data from two sources -media coverage and campaign finances, which both measure other aspects of valence.</p><p>I further collect candidate data as measures of valence. Media coverage proxies for name recognition and the amount of campaign contributions a candidate raises proxies for candidate effort and candidate quality <ref type="bibr">(Prat, Puglisi, and Snyder 2010)</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Party Loyalty on the Long Ballot</head><p>A descriptive analysis of the pattern of votes can start to rule out several hypotheses. If candidates and voters were thoroughly nationalized, straight ticket rates should be equally high in every office. And if voters chose candidates based on party and valence but information about a candidate's valence attributes was harder to come by in state and local races, straight ticket rates should be higher in state and local offices. I present three sets of analyses: straight ticket voting rates at the voter level, the office level, and finally an analysis of the principal dimensions of vote choice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.1">Voter-Level Straight Ticket Voting</head><p>Throughout this analysis, I refer to straight-ticket voting as the action of choosing candidates of the same party for all partisan offices under consideration. There are two subtleties to this operationalization. First, uncontested races do not offer voters a real choice to either vote straight or split ticket, and therefore I will limit my analysis to contested races. Throughout I will use contested to mean that the contest features both a Democratic and Republican candidate. For example, if a contest features a Republican candidate, a Green party candidate, and a Libertarian candidate, I still count that as an uncontested race. In the discussion, I consider the implication for this restriction when generalizing to voter behavior in other districts.</p><p>Second, South Carolina is one of the few states in which voters have an option to explicitly cast a straight ticket. A voter can either click through the entire touchscreen ballot, or he can select the "Straight Ticket Party Option" that appears as the first question on every ballot (See Appendix A.1 for an example). To avoid confusion of terms, I refer to this latter option as using the party lever, a slightly dated phrase originating from the era when voters pulled a physical lever on a voting machine to the same effect.<ref type="foot">foot_12</ref> Pulling the lever for a particular party auto-fills the voter's ballot to select that party's candidates for every applicable contest. These selections are reversible case-by-case before the ballot is cast, and in my dataset I find slightly below 3 percent of voters who use the Republican or Democrat party lever later switch their vote in a contested race.</p><p>The Appendix Table <ref type="table">A</ref>.4 shows the prevalence of straight ticket voting in my entire dataset. The number of contested races on a voter's ballot can range between 1 to 12. I compute the proportion of straight ticket voters among those contested choices only, and show the proportion as well as the general distribution of party loyalty. In the modal case of a ballot with 5 contested races, 77 percent of voters are straight ticket voters. The proportion drops to the 60s among voters who happen to face a ballot with more contested races. These numbers arguably inflate the proportion of full straight ticket voting because it includes those who pulled the party lever and likely gave less consideration to each office. Among the half of the electorate that opted out of the party lever, the prevalence of straight ticket voters is about 10 to 30 percentage points less than the full sample.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.2">Party Loyalty by Office</head><p>Are defections from a straight party ticket more prevalent in some offices than others? As a first-order description, Figure <ref type="figure">3</ref>.2 sets the office at the top of each election's ballot as the reference category and shows the overall rate of split ticketting by office.</p><p>In red are the rates of voting among "Republican" voters: those who voted for Romney, Trump, or the gubernatorial candidates Haley or McMaster, depending on the year. In blue are "Democratic" voters who voted for Obama, Clinton, or the gubernatorial candidates Sheheen or Smith. For example, the figure shows that among voters who voted for a Republican President or Governor candidate in a congressional district where the House race was contested, only 4 percent of them split their ticket, or voted for the Democrat.</p><p>Because Figure <ref type="figure">3</ref>.2 does not make within-person comparisons, I use a tailored clustering algorithm to summarize the data into interpretable prototypes of voting patterns while still leveraging the full distribution of voting patterns that the cast vote records reveal. Clustering is an attractive approach several reasons. Like ideal point estimation, clustering efficiently incorporates data in which the same individual makes multiple choices, and it can do so even when there is no information about the individual other than their choices (as in cast vote records).<ref type="foot">foot_13</ref> In contrast, simple comparison of ticket splitting rates by office will treat votes for each office separately without  In this clustering method, the user picks the number of clusters to divide the voters into, and a fast Expectation Maximization (EM) algorithm identifies the set of cluster assignments that best fits the data. Formally, I posit that each voter i belongs to one of K clusters, but that cluster membership Z i is unobserved. Instead we only observe a vector of J choices Y i = [Y i1 , ..., Y iJ ] for each voter: a straight ticket, split ticket, or abstain for each of the J offices on the long ballot. In its simplest form, a clustering algorithm uses only the matrix Y and a simple model of vote choice to esti-mate two quantities: The overall prevalence of cluster k:</p><p>and the probability that a member of cluster k votes for choice in office j:</p><p>The model of vote choice underlying this representation is that a vote in one office is independent of each other, within each cluster, i.e.,</p><p>This also serves as the identification assumption to estimated the parameters in the clustering algorithm.</p><p>The algorithm is designed to be tailored to three features of the ballot data, with formal derivations in Chapters 5 and C. First, to account for abstentions and third party votes, outcomes are allowed to be unordered categorical variables. Second, unlike a canonical clustering model, I account for the fact that uncontested races provide voters with a limited pool of candidates, the method incorporates these varying choice sets by an independence of irrelevant alternatives assumption. Third, to handle over a million votes, the algorithm uses a C++ backend to perform internal calculations quickly. Both the choice of the number of clusters (K) and the substantive interpretation of each cluster is determined by the user. Although this leaves room for some ambiguity when implementing the clustering algorithm, one does not need to commit to the view that there exists a single correct number of clusters in the data. One cluster can often be divided into two slightly more homogeneous clusters. To provide some guidance, I cluster the same data with values of K between 2 and 10, and compute the BIC fit statistic. I ultimately choose to present results with K = 4 given that is where the fit statistics start to level off in 2016 (Appendix A.3). The BIC statistic uses the observed log likelihood that the EM algorithm tries to maximizes, penalized by the number of parameters it is asked to estimate. I then provide a label for each of the clusters according to the values of the estimated values of the vote choice parameters &#181;.</p><p>In 2016, a bare majority of both Republican and Democratic voters (as inferred from their presidential vote choice) are solid partisans in their votes, because they vote solidly for the same party up and down the ticket. In 2012, only about 40 percent of the electorate is classified as solid partisans. Even in 2016, this group is not large enough a group to deliver a election-winning majority for a particular candidate, as Romney and Trump comprised about 55 percent of the state's electorate.</p><p>The second largest cluster of voters vote for the same party in congressional races but are more likely to split their ticket in state elections. This pattern is particularly noticeable among Republican voters, where for example 5 percent of Trump voters in cluster 2 split their ticket in the U.S. House but 15 to 50 percent of them split their ticket for the Democrat in their vote for the state legislature, sheriff, and county council. This cluster comprises about 35 percent of both Trump and Clinton voters, which makes them large enough a group to be pivotal even in a statewide race. The third and fourth largest cluster of voters vary in their voting patterns by year and party.</p><p>Many of these groups primarily abstain after voting for President, while a the fourth cluster among Clinton voters appears to be solid Republicans who only broke away from their party preference in the race for President.</p><p>Finally, the clustering differentiates between ticket splitting for particular offices.</p><p>Among 2012 Obama voters, for example, two out of its four clusters had substantial probabilities of splitting their ticket, but while the second largest cluster was most likely to split in the vote for State House, the third largest cluster was more likely to split in the office of Sheriff.</p><p>Therefore, despite one line of reasoning that would predict straight ticket voting to be more prevalent in down-ballot races where candidate specific information is scarce, election day voters tend to defect from their national party loyalty as much as, if not more than, national congressional races. These cast vote records reveal new patterns of voting behavior that have been not possible to measure in existing surveys and election returns.</p><p>To summarize, both Figures 3.2 and 3.3 exhibit the same general pattern. First, it is not the case that voters vote more straight ticket in state and county offices than they do for Congress. Most straight ticket rates in state legislative and county executive offices are lower than those for Congress, and especially so for sheriff and county council races. For example, while 94 percent of top of the ticket Republicans voted for a Republican congressional candidate, only 77 percent of them in contested sheriff elections voted for a sheriff. Accordingly, ticket splitting is more prevalent for the majority of these state and county offices. Roll-off is also higher further down the ballot: typically around 1 percent in congressional elections and 2 to 4 percent in down bal-   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Incumbency and Split-Ticket Voting</head><p>The analyses so far show clear variation in the proportion of straight ticket voting.</p><p>What, then, explains that variation? The past work on local politics and ticket splitting suggest that incumbency is a natural factor to inspect. Incumbency is the primary indicator that proxies for, or at least correlates with, the whole bundle of these valence attributes. Some component attributes like name familiarity and campaign salience can be measured by the newspaper coverage and campaign finance reports.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.1">Difference in Means</head><p>Access to the full ballots allow some straightforward calculations to estimate how much ticket splitting is explained by incumbency. I first count the fraction of split ticket votes that were cast towards the incumbent. I subset the data to six offices with a sufficient number of contested contests in enough districts, and further subset to contests which featured an incumbent running for re-election against a major party challenger. After these restrictions, we are left with 566,232 split ticket votes (as defined in Figure <ref type="figure">3</ref>.2) cast by 495,138 voters. In each of these split ticket choices where the voter could choose between the incumbent or the challenger, a clear majority of 69 percent voted for the incumbent.</p><p>Open-seat contests serve as an additional useful comparison because incumbency is not at play. In Table <ref type="table">3</ref>.2, I show the proportion of the straight ticket voting rates separated by the presence of an incumbent and the party affiliation of the incumbent.</p><p>If voters value the qualities associated with incumbency, we would expect to see the most same-party votes when (i) the incumbent is of the same party as the voter's top of the ticket choice, and fewer same-party votes when that (ii) entails voting against the incumbent. Finally, the rate of straight ticket voting when (iii) there is no incumbent should be lower than case (i) but higher than case (ii).</p><p>Consistent with those expectations, in all of the six offices covered in Table <ref type="table">3</ref>.2, straight ticket voting is highest when doing so coincides with voting for the incumbent. Among contested U.S. House races, 94 percent of voters whose party choice at the top of the ticket happened to align with the party affiliation of their U.S. House incumbent voted for that incumbent (column (i)). But when they did not align (column (ii)), only 87 percent of these voters voted straight ticket, indicating a split ticket to vote for the incumbent. The rate in open-seats where no incumbent exits (column (iii)) tends to fall in the middle of the two values for all offices. If voters did not value the qualities associated with incumbency, all three proportions for each office would have been the same. Instead, we see sizable differences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.2">Contributions of valence controlling for partisanship</head><p>An individual-level regression allows for a more controlled comparison, modeling vote choice after matching individuals of similar revealed preferences in national offices. For each voter i making a choice for race j on their ballot, ticket splitting can be modeled from a linear probability model of the form    These estimated effects of incumbency in state and local races persist for several offices even after controlling for newspaper coverage and the campaign fundraising, with multivariate regression results presented in Appendix A.3. These findings suggest that incumbency is not merely a proxy for name familiarity and news coverage. Although the multivariate regressions cannot pinpoint the specific mechanism at play, it suggests that voters both can perceive and care about the range of factors that originates from experience on the job and other reputational advantages.</p><p>In summary, these analyses show that majority of ticket splitting is cast in favor of the incumbent in state and local offices, even though the ballots do not include incumbency or any other information about the candidate. The deviations from straight ticket voting in the previous section are not arbitrary, but systematically benefit candidates who have relevant experience as incumbents, are better known, and raise more campaign funds from voters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Generalizability</head><p>One limitation of these findings is that they examine contested races in a single state. In Appendix A.4, I analyze a smaller set of cast vote records in two states -Maryland and Florida, and find similar patterns of higher ticket splitting in state and local offices. Other than that, several considerations suggest that the main implica-tions here are generalizable to most contexts in contemporary American politics.</p><p>South Carolina state legislative seats are one of the least contested in the country. For example, according to Ballotpedia, only 30 percent of state legislative districts were contested in the 2012 general election, putting the state's competitiveness index only ahead of Georgia and Massachusetts. But it is reasonable to expect that split ticket rates would be higher if parties contested more districts. Districts with no challenger tend to be those where the disadvantaged party's chance of victory is slim to begin with <ref type="bibr">(Rogers 2015)</ref>. Therefore, voters who value the correlates of incumbency should be even more likely to cross party lines if a disadvantaged party were to enter a lopsided race.</p><p>When extending to other states, the findings here suggest that ticket splitting would be less prevalent in down ballot races where candidate-specific information is sparse, two-party competition is high, and the incumbency advantage is weak. One might worry that South Carolina is an outlier in this regard: An uncompetitive state consisting of lopsided districts. But as <ref type="bibr">Fraga and Hersh (2018)</ref> showed using congressional, statewide, and state legislative elections, it is rare for any single voter to reside in that sort of enclave. South Carolina is no exception. The long ballot and the high degree of district overlap in the U.S. electoral system all but ensure that most voters' long ballots feature competitive contests as well as uncompetitive ones.</p><p>Another concern when predicting patterns in other states is South Carolina's history as a Southern State, where Democrats such as Strom Thurmond switched to the Republican party in a massive realignment in the 1960s and 1970s <ref type="bibr">(Mickey 2015;</ref><ref type="bibr">Key 1948</ref>, also documented in Table <ref type="table">A</ref>.2). It is likely that some of the pattern here is driven by older voters who, like in the election of Dwight Eisenhower, voted for Republican national candidates but Democratic candidates in state and local candidates.</p><p>On the other hand, realignment is a common feature of a two party system and many other U.S. states outside the South experienced realignments, if not to the same degree. Moreover, the logic of the incumbency advantage and valence does not rely on such massive realignments. I now finally turn to the more speculative question of whether the dynamics of split ticket voting in state and local races documented here will eventually disappear in an era of increasing nationalization. Most studies describe nationalization as largely a top-down process <ref type="bibr">(Abramowitz and Saunders 1998;</ref><ref type="bibr">Aldrich 2000)</ref>. The electoral trends over the past 40 years also document a consistent trend towards Republican dominance in the state. But this realignment did not impact all levels of elections at once, nor did it progress at the same rate (Appendix A.2). Within the set of elections collected in my dataset, I do find an uptick in the rates of straight ticket voting overtime (Appendix A.3), but how these rates would change beyond the sample depends on the future party alignment in national politics, something beyond the scope of this paper. My results do suggest that the incumbency advantage is still a significant force in state and local elections and may delay the tides of nationalization that has swept congressional and gubernatorial elections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7">Conclusion</head><p>The picture of the electorate that emerges from these analyses is one that votes largely along party lines, but still with important variations across offices and candidate's incumbency. A new dataset that provides an unprecedented view into voters choices showed that about seven out of ten voters in South Carolina vote a complete ticket, and in any given office, about eight to nine out of ten voters vote the same party as the President or Governor. Ticket splitting is especially prevalent in sheriff contests, and state and local contests are more varied in their level of split-ticket voting.</p><p>Should we consider the statistic that 80 percent of Trump (Clinton) voters voted for a Republican (Democratic) county sheriff candidate to be a large or small number? On the one hand, as I have suggested, this is smaller than what we would expect from a fully nationalized politics. A district in which 20 percent of voters is up for grabs is by most measures a volatile one. On the other hand, there are also good grounds to interpret party loyalty of such degree as too high. The traditional view of local politics has been that it is void of partisanship altogether, with recent work updating that view <ref type="bibr">(Tausanovitch and Warshaw 2014;</ref><ref type="bibr">Bucchianeri 2020</ref>). Another study of close elections between Democratic and Republican sheriffs finds that party control does not cause changes in how sheriffs implement policy <ref type="bibr">(Thompson 2019)</ref>, suggesting that partisan splits in voter's supports for that office is disconnected from the policy preferences of the candidate.</p><p>Regardless of one's interpretations, the findings presented here would not have been obvious without empirical investigation. Extending the nationalization literature into state and local politics, one might have predicted that the rate of straight ticket voting to be equally high for all pairs of offices. And extending theories of partisanship and political communication, one might have predicted that, if anything, straight ticket voting rates would be higher in down ballot races than in national ones because voters would have less candidate specific information to inform their choices for county council than they would for the U.S. Senate.</p><p>The first part of the empirical findings do not lend support for these predictions, at least for the average voter and the average election. The second part starts to reveal why. Incumbents systematically netted more votes from party defection than challengers or contestants in an open race (Table <ref type="table">3</ref>.2), even after controlling for a voter's own party allegiance (Figure <ref type="figure">3</ref>.5). This pattern is consistent with a model of elections with nationalized partisan candidates in state and local offices with differ-ent levels of experience or quality. In other words, even if we assume that nationalization has so thoroughly polarized candidates such that even candidates for state and local offices follow the national ideological platforms of their respective parties <ref type="bibr">(Hopkins 2018;</ref><ref type="bibr">Shor and McCarty 2011)</ref>, some voters know and care enough about nonideological aspects of the candidates such that they split their ticket.</p><p>By constructing the first database of cast vote records spanning an entire state across multiple elections, this study has overcome some common challenges researchers face in studying vote choice for state and local office. Cast vote records will prove valuable for understanding electoral behavior more widely. In addition to ticket splitting, they allow researchers to study vote choice in party primaries, ballot measures, and elections for non-partisan offices such as school board elections. Existing studies of these three types of elections are limited by the same sort of measurement problems that surveys and election returns have for studying ticket splitting, and therefore can benefit from wider use of cast vote records.</p><p>The findings of this paper are not without their limitations. Cast vote records reveal how people vote in state and local offices, but they reveal much less about the demographic characteristics of those voters. And partly because of this limitation of survey or demographic evidence, it becomes difficult to disentangle potential mechanisms underlying the findings, i.e., a valence advantage, candidate moderation, or multidimensional voting. Future research that combines cast vote records with precinct-level data could help distinguish more carefully the process through which voters form their preferences for state and local offices. This paper does, on the other hand, establish some baseline expectations for state and local elections in a nationalized politics. On election day, U.S. voters must make a series of choices with limited information beyond party labels. But after an accumulation of campaign outreach, media coverage, and information acquired through everyday observation, a considerable number of vot- them further by about 1 to 2 percentage points (to 5.9 points). * I acknowledge the support of NSF Grant 1926424 and thank Steve Ansolabehere, Andrew Gelman, Yair Ghitza, Lauren Kennedy, Jonathan Robinson, and especially Soichiro Yamauchi for numerous discussions on the findings related to this chapter. I thank Douglas Rivers, Eddie Mertz, and Brandon Bertelsen for sharing the summary statistics from YouGov's database to enable extensions.</p><p>Surveys continue to be the main way through which scholars study electoral behavior. As survey samples have become increasingly larger, social scientists have turned to estimating quantities at particular subgroups of the entire data <ref type="bibr">(Broockman and Skovron 2018;</ref><ref type="bibr">Kalla and Porter 2020;</ref><ref type="bibr">Hertel-Fernandez, Mildenberger, and Stokes 2019)</ref>. At the same time, the political survey community has also become more conscious about selection bias and unrepresentativeness in these data. Accurate subgroup estimates of voter behavior at the state and legislative district level are crucial for studies of electoral politics like the one I explore in this dissertation.</p><p>As ubiquitous as the use of pollster's survey weighting is, however, the construction of weights is still an open discussion in social science research for which little guidance exists. The observation that "survey weighting is a mess... the construction of weighting itself is an uncondified process" <ref type="bibr">(Gelman 2007</ref>) still rings true. The situation has undoubtedly improved, with pollsters documenting their own complex weighting process <ref type="bibr">(Ansolabehere and Rivers 2013)</ref> and review texts that connect weighting methods in a single framework <ref type="bibr">(Caughey et al. 2020)</ref>. However, many of the methods are still out of reach for applied researchers who want to adjust existing weights to their own subgroup of interests.</p><p>Moreover, the suitability of a set of weights is not a black-and-white issue. Because much of the validity of an estimated weight depends on the quality of imperfect data, an empirical accounting of how well a set of survey weights adjust a sample to various geographies is therefore necessary for applied researchers.</p><p>Methods for reweighting, and the estimation of synthetic population targets that are required to enable to such a weighting, has wide applications to modern survey research. For example, it is crucial for improving Multilevel Regression and Poststratification (MRP) models as well as non-MRP estimates. MRP combines the traditional study of shrinkage and partial pooling that is mostly concerned with variance reduc-tion with standard infrastructure for poststratification weighting that is mostly concerned with bias reduction <ref type="bibr">(Gelman and Little 1997)</ref>. Much of the recent research on MRP has exclusively focused on the former: improving the model that induces partial pooling in the outcome. It has held constant the poststratification table constant, often with off-the-shelf Census datasets that do not include the variables pollsters typically use. Improving the estimation procedure for population targets for subnational geographic units can benefit both traditional weighting and any MRP model.</p><p>Here I conduct a validation and outline an open-source method that encompasses all these aspects through a concrete example, the Cooperative Congressional Election Study (CCES). I discuss the sampling and small area problem in the CCES for congressional districts, where the survey sample for each district is only around 50 respondents. Specifically, I propose a workflow to construct a poststratification target that approximates the joint distribution of six standard variables: age group, sex, education, race, turnout, and congressional district (which are nested in states). A multinomial logit model with simultaneous calibration properties implemented by <ref type="bibr">Yamauchi and Kuriwaki (2021)</ref> allows this joint estimation, which is more scalable and has better theoretical guarantees than existing attempts for using synthetic distributions in MRP <ref type="bibr">(Leemann and Wasserfallen 2017;</ref><ref type="bibr">Ghitza and Steitz 2020)</ref>. I then show how such a reweighting improves the estimates of vote choice at the congressional district level in the 2016 CCES, while off-the-shelf poststratification with only a few demographic variables does not noticeably improve the aggregate error of estimates relative to a simple raw average. Partial pooling alone, which precedes the postratificaiton step in MRP, also does not improve the overall accuracy of the estimates. The proposed workflow is open-source and draws from datasets that can be downloaded by a user-friendly interface <ref type="bibr">(Kuriwaki 2021a)</ref>. In an extension, I show how non-public data such as party registration statistics in voterfiles can be used to further </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">The Rise of Online Surveys and Calibration Weighting</head><p>As the typical sample size of data have grown larger through technological innovation, one might think that survey researchers are no longer befuddled by small sample problems. But this is not so for two main reasons. Even with large datasets, scholars have turned to estimating population quantities at smaller and smaller subnational geographies. Table <ref type="table">4</ref>.1 compares the sample sizes of two common datasets at the national level, state level, and sub-state congressional district level. Even with a survey like the CCES which is an order of magnitude larger than the typical national poll, there are fewer than a hundred observations from a given congressional district (which represents more than half a million people). Second, as sample sizes have become larger, response rates have also plummeted, raising the danger that the survey samples we do collect are less representative <ref type="bibr">(Meng 2018)</ref>.</p><p>This work contributes to recent literatures in political science and survey statistics that has arised to keep up with the technical realities of polling. Three bodies of work are particularly relevant. Applied examinations of calibration weighting, recent statistical innovations in estimating calibration weights, and the existing small area estimation literature in political science which has largely focused on MRP.</p><p>An overview of weighting methods appears in <ref type="bibr">Caughey et al. (2020)</ref>. They identify the problem of target estimation as a challenging task, for which "how best to approach the problem is still an open question and a subject of ongoing research." I provide such an extension in estimating synthetic population data for a turnout electorate. I also focus on the issue of small area estimation, a topic Caughey et al. only discuss in passing and leave for further research. A concrete description of poststratification in the CCES is given in <ref type="bibr">Ansolabehere and Rivers (2013)</ref>. However, their benchmarks to election results and benchmarks stop at the state level, where survey samples are large (about 1000 respondents) and the weighting specifically target demographic distributions at that level. In this chapter, I investigate smaller areas of geography that the pre-computed weights are not adjusted to.</p><p>Target estimation and calibration weighting is a broad field, featuring classic studies that have enabled now standard tools such as rake weighting <ref type="bibr">(Deming and Stephan 1940)</ref>. But statistical methods in this area are continuously evolving, seeking to improve the stability of estimated weights and adding more calibration constraints to an approximation of the propensity score model. Contrary to the canonical model of inverse probability weighting typically associated with survey weighting, "survey weights are not in general equal to inverse probabilities of selection" <ref type="bibr">(Gelman 2007</ref>).</p><p>Instead, the population distribution that the weights target needs to be estimated itself, through a series of statistical imputation methods <ref type="bibr">(Caughey et al. 2020</ref>). Because many of these constraints are not observed in practice, there is room for improved modeling <ref type="bibr">(Ben-Michael, Feller, and Rothstein 2020;</ref><ref type="bibr">Zubizarreta 2015;</ref><ref type="bibr">Imai and Ratkovic 2014)</ref>. This chapter draws from the insights that have recently emerged in this statistical research, summarized most recently by <ref type="bibr">Chattopadhyay, Hase, and Zubizarreta (2020)</ref>. The central idea is that the calibration estimation is an approximation to the true propensity model, and bias-variance trade-offs exist in choosing an optimal set of weights.</p><p>Finally, MRP is an increasingly common method for survey inference at small subgroups, especially in political science <ref type="bibr">(Lax and Phillips 2009;</ref><ref type="bibr">Warshaw and Rodden 2012;</ref><ref type="bibr">Buttice and Highton 2013)</ref>. While MRP is a general procedure that covers many of the practical issues in subgroup analysis, it is important to remember it is essentially "a modification of the conventional poststratification estimator" <ref type="bibr">(Caughey et al. 2020, p.70</ref>) and its main innovation, the Multilevel Regression, does not directly address concerns for nonresponse bias. As I clarify in the next section, the multilevel regression stage of MRP uses a shrinkage method to deal with the high variance of small samples, but the identification assumption to validate this step is distinct from that of representativeness. Put another way, if the poststratification stage of MRP is biased or insufficient, so will MRP. MRP is also a data-intensive method. Practically all of the numerous studies that validate MRP or improve with machine learning methods innovate on the regression model and does not vary the post-stratification dataset.</p><p>Even those that do <ref type="bibr">(Leemann and Wasserfallen 2017)</ref> propose fairly simple methods for extending target areas, either assuming away the ecological inference problem or applying iterated proportional fitting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Methodological Foundations of Calibration Weighting</head><p>The fundamental problem in survey inference as well as the general idea in most survey adjustment methods is shown in of the data in this population is observed, and a sample is drawn through an unobserved selection mechanism to make inferences. The binary variable for selection, S i , is 1 if the individual ends up in the sample survey, and 0 otherwise. S denotes the set {i : S i = 1}. I use n to denote the sample size N i=1 S i . Suppose that the primary subgroup of interest is a subnational geography, such as congressional district, which I denote with the random variable A i &#8712; {1, ..., J}.</p><p>The subgroups need not be geographic but can easily be demographic subgroups or the interaction of the two. Geographic subgroups are of common interest for political science research and a subgroup where election results can be validated from official election results.</p><p>We denote the area of interest by j, and the quantity of interest as the population average in each area &#181; j is therefore represented as</p><p>where</p><p>is the population size for area j. To introduce the calibration (or post-stratification) weights that are now the norm in online surveys, it is useful to begin with the classic inverse probability model. Without complete random sample the sample average is no longer an unbiased estimator of the population, but if the selection probability for every individual is known, then an inverse probability weighting renders the estimator unbiased. This is the core of most survey weighting approaches as well as the power of propensity score weighting in causal inference <ref type="bibr">(Dehejia and Wahba 1999)</ref>. The standard correction weight, w i , then is proportional the inverse of the selection probability &#960; i :</p><p>In practice, the weight would be normalized by multiplying by a constant so that w is mean 1.</p><p>A common question in practice is if a weight for a national survey estimating a national population is valid when applied to a subset of the survey to estimate that subgroup population. In the ideal situation where the propensity score is known, the answer is yes. To see this, we can first see how the weighted proportion of the entire sample is consistent for the population mean. Because Y i is constant in the finite pop-ulation setting and &#960; i is constant if observed,</p><p>after which Pr(S i = 1 | X i , A i )w i will cancel and generate &#181;. The constant rescaling of the weights combine with the denominator 1/n is designed to adjust to the correct scaling. While a ratio estimator like this are not unbiased in general, it is asymptotically unbiased. In practice the statistical bias is often small and with more data the estimator converges. Now if we target the subgroup quantity in similar fashion, we simply keep the conditioning of A so that</p><p>will also equal &#181; j . Here n j = N i=1 1 (A i = j, S i = j) is the survey sample size for area j.</p><p>However, modeling the selection probability is difficult. In the causal inference setting, the propensity scores estimated through, for example, a logit regression do not guarantee balance in the particular sample <ref type="bibr">(Imai and Ratkovic 2014)</ref>. In the survey setting, the population of S i = 0 is unobserved so running such a regression is impossible.</p><p>That is why any survey weights computed after a survey is run are estimated through calibration methods <ref type="bibr">(Zubizarreta 2015;</ref><ref type="bibr">Chattopadhyay, Hase, and Zubizarreta 2020)</ref>.</p><p>Calibration methods in one form or another compute a vector of weights that meet a balancing constraint the user defines comparing the target population and sample. Constraints can handle joint distributions through a distinct metric <ref type="bibr">(Hainmueller 2012)</ref>, but in surveys the weighted only feasible constraint are moment conditions for observable covariates. That is, we find a vector of weights such that 1 n i&#8712;S w i X i = 1 N i X where the value of the right hand side is observed in the Census and other larger datasets. Because X covers multiple categorical covariates such as age group, education, and race, it is convenient to index all the poissle joint combinations of each level of the covariates and denote them as cells. Specifically, C i is a deterministic function of the covariate vector of respondent i and returns a number in {1, ..., C}.</p><p>Post-stratification weighting, rake weighting, iterated proportional fitting weighting falls in this broad umbrella of calibration methods because they follow this pattern as well <ref type="bibr">(Caughey et al. 2020)</ref>. Because online surveys through river samples generate cannot create design weights, the bulk of weighting that researchers encounter and can model fall under some sort of calibration weight.</p><p>Calibration weights implicitly model the propensity score, but inputs to calibration are almost always insufficient in the survey setting. Population moments that serve as the balancing conditions may not be observable for the covariates that are important in the propensity score. Despite the theoretical elegance of calibrating a survey to population constraints, in practice most population constraints are measured with error, measured from several years ago, or measured from yet another survey.</p><p>Conditions for a given geography may be even harder to obtain. For example, to calibrate the survey to the population distribution of religion, the CCES uses the national breakdown of religion reported from Pew's religion survey. However, Pew does not report breakdowns of religion by state, so the CCES cannot generate calibration weights that apply those constraints. These issues do not even touch on the estimation error due to functional form assumptions.</p><p>Once we frame survey weighting as an causal inference problem for observational data <ref type="bibr">(Kuriwaki and Yamauchi 2021)</ref>, the conditions that a calibration weight must satisfy to make the resulting area estimates unbiased is clear. The same qualifications to the validity of propensity scores apply here. Applied to each area separately: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Existing Data</head><p>This Note that the 2016 CCES does balance on statewide political races in one stage of the process, so can be considered as being calibrated to the state level (as well as the national level) but not at the CD level.</p><p>It will be important for later discussion of results that the CCES computes weights to a sample that has already been matched to a target distribution. This sort of pruning of respondents based on the pollster's sampling frame is often a crucial first step in online opt-in panels <ref type="bibr">(Rivers 2007)</ref>. The six southern states that record race on the voterfile are North Carolina, South Carolina, Georgia, Florida, Alabama, and Louisiana.</p><p>The states that record party on the voter file are Alaska, Arizona, California, Colorado, Connecticut, Delaware, Florida, Iowa, Idaho, Kansas, Kentucky, Louisiana, Massachusetts, Maryland, Maine, North Carolina, Nebraska, New Hampshire, New Jersey, New Mexico, Nevada, New York, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Dakota, Utah, West Virginia, and Wyoming.</p><p>the Secretaries of State that maintain voter rolls does not collect information on education. Racial identification is collected as part of the voter registration places in only six Southern states, and party registration is required only in about 30. Voterfile vendors therefore use survey and commercial data to merge or impute these variables, which may introduce additional error.</p><p>A practical resource for weighting surveys to small areas must take account of these data limitations. In the next section, I outline a workflow that estimates a reasonable poststratification table with existing, publicly available data. Such a poststratification table is equally valuable to be used directly used for weighting, or combined with a partial pooling model that imputes the outcome in each cell as in MRP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Synthetic Estimation of Poststratification Targets</head><p>I propose the following procedure to construct a poststratification target that approximates the joint distribution of age group, sex, education, race, turnout, and Congressional District (which are nested in states). We start with the following datasets:</p><p>&#8226; The CCES survey data, which includes the outcome Y , and all covariates X and A which will be drawn in from other population datasets.</p><p>&#8226; The ACS estimates of population sizes at the congressional district level. At this level of geography, the ACS does not give a full joint distribution. We must rely on two separate tables. One that records the population counts of [age x sex</p><p>x race x CD] and another that records the population counts of [age x sex x educ x CD]).</p><p>Therefore, the main challenge is that the ACS only gives a three-way distribution of demographics while poststratification requires a single, fully joint table, and the ACS includes no data on party or turnout.</p><p>(1) Fit a multinomial logit bmlogit <ref type="bibr">(Yamauchi and Kuriwaki 2021)</ref> predicting four categories of education using race, age, and sex with the CCES data, with the balancing constraint that within each CD, the estimated marginal proportions of education match the education margins reported in the separate ACS table.</p><p>( (3) Fit a logit model (again with bmlogit) predicting a binary indicator for turnout using the CCES data, where we use the indicator for voterfile match supplied by the Catalist (included in the public CCES dataset). The population constraint is given by the turnout rate among the voting age population, which can be computed from the ratio of total votes cast to the ACS estimates of the Voting Age Population. The process can falter when at least one set of survey data has at least one cell with zero observations, so here I use a simple specification of: (5) (optional) If party registration data is available and in the states where party registration is available, repeat the same process where the outcome in the multinomial regression is whether the voter in the CCES is a registered as a Democrat, a Republican, or anything else.</p><p>(6) Poststratify the survey estimates of the outcome to the resulting synthetic table.</p><p>If sample sizes for the resulting cells are too small, fit a regression model for the outcome, such as a multilevel model as in MRP.</p><p>Here, the balancing multinomial logit is a powerful population constraint. While a regular multinomial logit can fit the same sort of predictive model as in <ref type="bibr">Kastellec et al. (2015)</ref>, it is likely to simply propagate any bias due to unrepresentativeness into the resulting estimates. <ref type="bibr">Yamauchi and Kuriwaki (2021)</ref> implements software to estimate the multinomial regressions as a constrained optimization problem, where an additional constraint that the marginal distribution of the estimated outcome must match a user-supplied population constraint. Users can set a tolerance value to control the degree to which the constraint is enforced relative to the best fitting model in the microdata.</p><p>Estimation of population targets is a rich literature of its own, and the approach I propose here is simple relative to other approaches that use proprietary data or software. For example, the CCES itself uses a sampling frame constructed by YouGov that also relies largely on the ACS (Ansolabehere, Schaffner, and Luks 2017). To this table, YouGov adds turnout estimates from the CPS and religion from Pew. However, this sampling frame is proprietary to YouGov and it is only calibrated to the state level. <ref type="bibr">Ghitza and Steitz (2020)</ref> use the state-level ACS microdata to estimate onto individual census-tracts, while correcting for representativeness through a type of rake weighting. In contrast, an attractive feature of the proposed model is that it imposes a balancing constraint simultaneously with parameter estimation, and uses publicly available data and summary statistics.</p><p>Resulting estimates do not come for free. To overcome the small sample problem, the outcome model partially pools observations from multiple CDs and uses those parameter estimates to predict the outcomes in a single CD. However, this requires the assumption that the demographic predictors and the CD random intercept is sufficient to model the variations in the relationship between the outcome and predictors across the multiple districts <ref type="bibr">(Si 2020)</ref>. This becomes a classic bias-variance tradeoff, where pooling across districts induces bias but subsetting to specific districts in fitting the model suffers from large variance or even demographic strata with 0 observations. Another limitation of poststratification, including MRP, is that it cannot balance on important variables if its population distribution is unknown. This is an important omission for political surveys because partisanship is clearly heavily predictive of vote choice but partisan self-identification, the most commonly used measure of partisanship in surveys, is only measured in surveys themselves. Two other related measures of partisanship are vote choice from the past election and party registration where it is available. Incorporating lagged vote would require specific adjustments for voters who did not participate in the previous election. Incorporating party registration is a promising approach, especially for the CCES that includes Catalist's matched voter registration for every respondent. In this chapter, I use summary statistics of party registration in the 2016 election provided by YouGov and show that its incorporation indeed improves estimates. However, it is unclear to extrapolate this calibration to states where party registration is not recorded. The limitation is shared by virtually all methods for poststratification and is a topic of future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Empirical Assessment: Existing Weights</head><p>I test these strategies on the problem of measuring Donald Trump's vote share as a proportion of the two-party vote in the 2016 election. Because congressional districts cut across election reporting administrative units in complex ways, some care is needed to compute the ground truth all subsequent estimates will be compared against. I use values computed by Daily Kos (Daily Kos 2021).</p><p>The CCES is a survey of voting age adults, while the population of interest is those who voted. When estimating the outcome of interest, therefore, I subset the CCES to respondents who meet all three of the following criteria:</p><p>1. Those who responded to the post-election wave (82 percent), 2. Those who self-reported voting for either Donald Trump or Hilary Clinton after the election (76 percent of those who took the post-election), and 3. Those who matched to Catalist's voterfile as having cast a ballot for the 2016 General Election (56 percent)</p><p>This leaves a total of n = 28, 462 respondents, or 44 percent of the available 2016 CCES. When fitting multinomial models to construct the population target, I use all 64,000 respondents to match the coverage of the ACS.</p><p>I first assess how standard weighting that CCES includes applies at the subgroup level. Figure <ref type="figure">4</ref>.2 compares these standard weighted estimates with population variables. I plot the raw proportions and weighted proportions side by side, and compute standard errors by the standard formula</p><p>where Y j is the proportion estimator (either the simple average or the weighted average) for the area of interest and n eff j is the effective sample size. For the unweighted case the effective sample size is equal to the sample size (n eff j = n j ), but for the weighted proportion it is computed with the Kish design effect correction:</p><p>This effective sample size can be rewritten as a function of the inverse of the sample variance of weights. It decreases as the weights get more variable.</p><p>The state level estimates in Panel (A) show the power of weighting. While 23 states have 95 percent confidence intervals do that include the actual result without weights, all but one (California) of the weighted state estimates include the actual election result. The root mean squared error (RMSE) of the set of estimates improves nearly three-fold. This is of course not surprising given that the weights were calibrated to statewide election returns. As for the national popular vote, the weights give an estimate of 49.5 percent while the raw average gives an estimate of 45.8 percent (Trump's two-party popular vote was 48.9 percent).   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>121</head><p>In the Congressional District level estimates of Figure <ref type="figure">4</ref>.2 Panel (B), we see the national weights and state weights failing to reduce aggregate error. The average CD has a raw estimate that is off by 7 percentage points compared to the 6 points at the state level. But while weighting dramatically improves the range of estimates at the state level, applying the same weights to the CD level does not improve but instead worsens the average deviation from the district vote share. More CDs have 90 percent confidence intervals using the weighted estimates include the true value than do the confidence intervals using the unweighted estimates (80 percent as opposed to 76 percent). But this is likely because the standard errors of the estimates increased from 6 percentage points to 8 percentage points due to weighting (equation 4.6). The RMSE and average deviation, which does not take into account the standard around each estimate, gets worse after weighting.</p><p>The finding in Figure <ref type="figure">4</ref>.2 is not surprising in the sense that the CCES weights were not designed to match to the Congressional district level, whereas they were calibrated to the state level. It is almost rather impressive that the aggregate error is limited to around 10 percentage points when district has an effective sample size of about n eff j &#8776; 40 in the weighted case and no explicit adjustment is made to weight to the turnout electorate. In any case, we see the theoretical results in equation 4.3 not appearing to hold in this data. Understanding calibration methods as an approximation to the propensity score likely explains why. It suggests that the balancing constraints that were used to construct the weights were not sufficient to render the selection ignorable for every district. The next question is whether we can create poststratification tables to design better weights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">Empirical Application: Proposed Poststratification</head><p>There is no obvious way to visualize the result of a six-way cross-tabulation, but Figure <ref type="figure">4</ref>.3 is one representation of the values resulting from the target estimation procedure. The procedure produces cell counts N cj for area j, where c &#8712; {1, ..., C} indexes the multi-way table of categorical demographic variables. In this instant, c indexes the combination of age group (5 levels), sex (2 levels), education (4 levels), race (4 levels), and turnout (2 levels), so C = 320. Groupings were determined to match the levels of the ACS variables, and grouped together so that at least every state had one CCES observation of that level. The annotated point on the figure shows, for example, that each poststratification cell is around 0 to 2 percent of the estimated electorate. The estimated size is of course a function of the size of the group in the population. One interesting comparison is the proportions across the turnout and non-voting groups. Some CDs have relatively high levels of Hispanic representation, while in other CDs Hispanics comprise a relatively large group of the non-voting electorate. Non-citizens are included in the non-voting (voting age) electorate, which may explain these high numbers in Texas.</p><p>We cannot validate each of these estimates of the population quantities, given that it estimates a joint distribution of variables none of the population datasets can provide (Table <ref type="table">4</ref>.2). The method guarantees, however, that these estimates of the joint distribution match all population marginals. And instead of assuming that the distribution of covariates are independent and taking the product of marginals, I use individual survey data to assist in learning the joint distribution.</p><p>Weighting the outcome to this target population requires survey sample estimates of the outcome for each of the C &#8226; J cells. For each cell cj, denote the average of the outcome in the cell as Y cj . When cells are too fine such that n cj = 0 for some cells, we  model a outcome regression with a shrinkage property to estimate these values, such as Y cj . As Figure <ref type="figure">4</ref>.1 shows, the post-stratification estimator and the MRP estimator is simply the sum of these estimates reweighted to the estimated size of the population:</p><p>where &#181; PS j denotes the post-stratification estimator for area j and &#181; MRP    Compared to the direct, small-sample estimators of Figure <ref type="figure">4</ref>.2, the smoothed estimators in Figure <ref type="figure">4</ref>.4 feature tighter credible intervals and modest reductions in the discrepancy between the true vote share. The first estimates show that simply partially pooling the survey data by congressional district does not lead to a improvement in the aggregate error in this case. However, poststratifying on a three-way demographic table appears to have no clear reduction in aggregate error. The estimates improve only in the final panel, when a four-way table is modeled so that surveys can be re-weighted to the joint distribution of race, education, age, and sex by Congressional district, within an estimated electorate instead of the all voting age adults.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Off the</head><p>A common extension we consider only at the end of this chapter is to add a arealevel continuous variable, such as prior vote share in the district, in the outcome model.</p><p>This covariate has been shown to improve the overall accuracy of predicting electoral outcomes, perhaps more so than individual demographic variables <ref type="bibr">(Hanretty, Lauderdale, and Vivyan 2016)</ref>. While all our models here likely benefit from this addi-tion, we do not show results for this here because these covariates do not contribute to post-stratification. This can be gleaned for the fact that when district level vote share is simply included in the model, the joint distribution of vote with the other demographic variables is not known. The addition of the aggregate predictor improves the outcome model and partial pooling, that is the estimates of Y cj , but not the poststratification <ref type="bibr">(Kuriwaki and Yamauchi 2021)</ref>. As previously noted, growing literature that tests various machine learning models in this aspect of MRP already exist (Bisbee 2019; <ref type="bibr">Goplerud et al. 2018;</ref><ref type="bibr">Ornstein 2020)</ref>, while the variation in poststratification targets has been relatively unexplored.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7">Extensions by Modeling Party Registration</head><p>Extending the synthetic table to include party registration is a simple repetition of the modeling procedure, but may require statistics that are not readily available.</p><p>To further test the idea that modeling relevant covariates in the poststratification can improve the accuracy of estimates, I used currently non-public data to complete optional step (5) to add one more dimension to the table. YouGov maintains a curated database of the voterfile used for their own weighting, and provided a subset of their Only certain states record party registration on their voterfile (Table <ref type="table">4</ref>.2), so for this application I chose the following seven party registration states that cover a variety of regions and population sizes: Arizona, Florida, Iowa, Maine, North Carolina, </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 4.5: Benefits of Additionally Modeling Poststratification Targets</head><p>Note: A comparison of modeled estimates using the same survey data from seven party registration states: Arizona, Florida, Iowa, Maine, North Carolina, New York, Oregon. The final model (6) uses a synthetic population target that also includes party registration breakdowns jointly with other demographic variables.</p><p>Error bar show 90 percent credible intervals from 2,000 MCMC samples.</p><p>New York, and Oregon. In order to provide a valid comparison of methods, I recompute the standard MRP and direct estimates in those same states so the underlying data is held constant. Models (1) and ( <ref type="formula">2</ref>) repeat the finding from Figure <ref type="figure">4</ref>.2 that using a weights cali-brated with coarser constraints may not help and even hurt direct subgroup estimates.</p><p>Model (3) -( <ref type="formula">6</ref>) are all MRP models following the fnding with all states. Model (3) applies minimal partial pooling without any demographic poststratification, as in the first model of Figure <ref type="figure">4</ref>.4. We see that the estimates of (3) on do not differ on aggregate from the raw averages. This is one hint that much of the improvement due to MRP is the final poststratification stage rather than the first outcome modeling stage.</p><p>Models ( <ref type="formula">4</ref>) -( <ref type="formula">6</ref>) vary the underlying target populations in increasing complexity.</p><p>Model ( <ref type="formula">4</ref>) is a simple baseline, which, as in Figure <ref type="figure">4</ref>.4, only the ACS table measuring</p><p>[age x sex x education x turnout x CD] is used. There is no apparent improvement in the aggregate error with this simple MRP. We only start to see improvements in model ( <ref type="formula">5</ref>), which uses the proposed workflow of this chapter and creates a synthetic table of four demographic variables and models turnout, both through our balancing multinomial logit model. This improves the root mean square error from the raw average but only by a tenth of a percentage point or so.</p><p>The most noticeable improvement comes from model (6) which finally incorporates the party registration breakdown in the electorate. This model, again, ensures that the weighted proportion of registered Democrats and registered Republicans in the survey sample match those reported by the voterfile, for each congressional district.</p><p>The aggregate error decreases by about 2 percentage points compared to the raw average or the partially pooled estimators. It decreases by another percentage and a half, to 4.5 percentage points, after aggregated vote share is included as an aggregate, continuous variable in the outcome model. The strength of the party registration variable is reasonable given that the outcome of interest is voting for a Republican candidate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.8">Modeling Aggregate Covariates</head><p>A final extension I consider is the inclusion of aggregate predictors in the estimation of the partially pooled estimates Y c . Although this is not the focus of the methodological innovations in this chapter because it is related neither to post-stratification or partial pooling (in the random effect sense), this sort of predictor has been shown</p><p>to make a notable improvement in MRP estimates <ref type="bibr">(Hanretty, Lauderdale, and</ref>   There are improvements across the board, with even simple outcome modeling nearing the accuracy of the most complex model. The marginal benefits of modeling different poststratification tables appear almost to have been wiped out by the large priate. And again, the extent that this predictor dwarfs the gains in target estimation is a broader question for survey modeling. Because the voteshare is not quantitative and therefore treated as a fixed effect in the outcome model, the added value comes neither from post-stratification per se or partial pooling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.9">Conclusion</head><p>This chapter proposed a framework to improve the target estimation for geographic </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Introduction</head><p>Finding and labeling voting blocs are ubiquitous in election analysis. Theories of political behavior, especially those explaining electoral change, cluster voters into interpretable prototypes and assign them labels such as core and periphery, standpatters and floating voters <ref type="bibr">(Campbell 1960;</ref><ref type="bibr">Hill and Kriesi 2001;</ref><ref type="bibr">Key 1966;</ref><ref type="bibr">Smidt 2017)</ref>. Almost instinctively, political consultants, journalists, and election observers latch on to labels such as the "soccer mom" or the "white working class" to construct narratives about voting behavior, even if the label may not have a uniform definition or may not be the best statistical predictor <ref type="bibr">(Carroll 1999;</ref><ref type="bibr">Cohn 2019a;</ref><ref type="bibr">Carnes and Lupu 2020)</ref>.</p><p>In particular, a recurring voting bloc in modern accounts of the US electorate is the "swing voter" -a pivotal (and perhaps dwindling) group of voters who are indifferent between either party to a first approximation and are therefore considered persuadable.</p><p>But existing approaches to this grouping exercise in political science are either based exclusively on pre-defined groupings, or on a series of comparisons between votes in pairs of offices. The former risks not fully leveraging the information contained in the data, and the latter simply becomes intractable with high-dimensional large-N datasets with an exceeding number of possible voting patterns.</p><p>In this chapter I offer an alternative framework: a clustering algorithm that summarizes complex individual-level voting data to interpretable blocs using a probabilis- this model to voting data,<ref type="foot">foot_17</ref> perhaps due to concerns about interpretability and lack of substantive theory. My proposed approach has three methodological features on this point. First, by using a clustering algorithm as opposed to the more standard regression approach, I can properly leverage the information that is contained by the same voter making repeated vote choice decisions between Republicans, Democrats, and abstention (for example) in multiple offices. Second, it embraces the principle of unsupervised learning more so than ideal point models. This entails targeting the parameters of a simple model that best fit the data, instead of modeling the behavior of known or presumed voting blocs. Third, my statistical approach is grounded on a probabilistic model of political behavior <ref type="bibr">(Ahlquist and Breunig 2012)</ref>, instead of simply grouping observations that are close on a particular distance metric <ref type="bibr">(M&#252;llner 2013)</ref>. Analysts still pick the number of clusters to estimate, and must use substantive prior knowledge to guide the interpretation of each cluster. In summary, the main virtue of the clustering approach is that it is a principled framework to leverage the information in high-dimensional voting data.</p><p>In the remainder of this paper, I describe the clustering approach and what it can reveal about swing voters in American Politics. In the models and methods section, I set up the model and show how I derive an EM algorithm to measure the parame-</p><p>ters in an open-source program, clusterCVR <ref type="bibr">(Kuriwaki 2021b</ref>). In the process, I highlight the assumptions and implications of this statistical approach. I then estimate the same parameters and use a visualization that highlights the estimated parameters in a more interpretable fashion. In particular, I find in my two applications that swing voters form a clear voter type, even though the office-specific patterns of how they swing varies by the type of office and experience of the candidate.  <ref type="bibr">(Imai and Tingley 2012)</ref>. Clustering is even more widely used in fields such as psychology and marketing <ref type="bibr">(Fiske et al. 2002;</ref><ref type="bibr">Wedel and Kamakura 2000)</ref>. But they are still less common than regression based methods in political science, so I start with the logic of the basic model that I implement in an open-source R package, clusterCVR, and finally respond to commonly recognized limitations of clustering as a method to analyze political behavior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Models and Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Main Logic of the Clustering Model</head><p>Instead of defining clusters of voters by predictors of vote choice such as race and education, this paper starts with the case in which we only observe the outcome of votes. Let Y be the N &#215; J vote matrix of voters i &#8712; {1, ..., N} voting in offices j &#8712; {1, ..., J}. Each vote Y ij takes on a discrete, unordered categorical value &#8712; {0, ..., L}.</p><p>In this paper, we focus on vote data where we have coded each vote as and the reality that in some offices in some districts are not contested by a major party further increase the number of considerations. Focusing on a pair of offices (e.g.</p><p>the President and US House, as in <ref type="bibr">Burden and Kimball (2002)</ref>) effectively discards valuable information, while enumerating each potential voting pattern <ref type="bibr">(Beck et al. 1992</ref>) reduces interpretability.</p><p>To address these issues, the clustering approach assumes that each voter i belongs to one of K "clusters", or latent groupings. We denote this membership as a random variable, Z i , and index clusters by k &#8712; {1, ..., K}. Importantly, although different individuals may belong to different clusters, there is no differentiation of clusters within an individual even across different offices.</p><p>It then posits the following model of vote choice that incorporates our two key parameters of interest. The prevalence of cluster k in the population by &#960; k , where &#960; is a K-length proportion that sums to 1 ( K k=1 &#960; k = 1), so that Z i &#8764; Categorical(&#960;).</p><p>(5.1)</p><p>Next, to characterize each cluster, &#181; jk represents the latent propensity for any member of cluster k to vote for a particular option in office j, so for a given cluster and given office,</p><p>For example, a political campaign may be interested in the size of the swing voter bloc and how likely that bloc is to split their ticket for a particular candidate in office j. In this case, if we had estimated two clusters, and set aside the first cluster for staunch partisans and the rest for potential swing voters, we would want to know the quantity &#960; 2 (the size of the bloc) and &#181; 2,j,split .</p><p>This modeling choice maps to a theoretical notion that ticket splitting is a probabilistic function of being a swing voter. This stands in contrast to existing approaches, which is more deterministic. Although splitting one's ticket may be a sufficient indicator of being a swing voter, it is not a necessary condition because a swing voter that is indifferent to either party should have roughly equal probability of choosing one candidate over the other <ref type="bibr">(Larcinese, Snyder, and Testa 2013)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">Estimation Strategy</head><p>Our goal is to estimate the unobserved parameters &#960;, &#181; that is most consistent with the data that we do observe, i.e. the vote choice matrix Y. To do so, we must assume the full data generation model as a function of the data and parameters. Once we assume that the probability of a particular vote is independent across offices within the same cluster, we can express the likelihood as a product of J factors:</p><p>which is similar to a standard multinomial regression except that we observe J data points for each voter instead of one, and that we actually do not observe the conditioning variable Z i . The independence assumption may at first seem unrealistic: a voter's propensity to vote for a Democrat in one office is surely dependent with his propensity to vote for a Democrat in the next. Note that we assume independence only within a cluster. In other words, this model allows for the dependence across offices by averaging over clusters.</p><p>I derive an Expectation Maximization (EM) algorithm to quickly estimate the parameters (details left to the Appendix). Because clusters are latent, traditional Max-ilar likelihood function as clustering, but instead of setting the probability &#181; as the quantity of interest, they posit that decisions are made according to a one-dimensional spatial voting model and estimates voting preferences on a continuum, rather than as separate blocs. The clustering approach can be thought of as trading away a parsimonious one-dimensional model for a more flexible approach to classify individuals that does not rely on a spatial model of vote choice. Users can choose the numbers of clusters to estimate, and the clusters are not restricted to be placed on a particular coordinate space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.3">Additional Features of the Clustering Model</head><p>So far, the clustering model discussed here follows the canonical model of finite mixtures for categorical outcomes. Several additional features are relevant for analyzing vote data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Respondent Level Covariates</head><p>It is natural to posit that certain demographic groups are systematically more likely to be in particular clusters. Clustering models can incorporate such auxiliary data about the respondents in a straightforward manner by modeling the cluster assignment as a function of covariates. Suppose we have an indicator for whether every voter is an ideological moderate. The spatial voting model would predict that this indicator to positively correlate with assignment into a cluster that tends to have high rates of ticket splitting.</p><p>Respondent-level covariates like these can be incorporated into the EM algorithm's M-step by regressing the expectation of cluster assignment on covariates in what is essentially a weighted multinomial logit model. Formally, we replace 5.1 with</p><p>where X is a N &#215; P matrix and &#947; k are P + 1 coefficients and an intercept. As this shows, this requires we let &#960; be a matrix with N rows. When summarizing the data in subsequent analyses, I take the average mixing proportion for each cluster as an aggregate measure. A model with and without such covariates should produce roughly similar cluster assignments because we still infer these from the votes. But incorporating covariates can help stabilize the algorithm, and the values of the coefficients &#947; provide useful substantive information for interpreting cluster membership. The ECM algorithm implemented by <ref type="bibr">Yamauchi (2021)</ref> makes this step fast enough to be repeated at each step in the main EM loop.</p><p>Varying Choice Sets Many elections for state and local offices are uncontested, which means that a voter still makes a choice, but from a limited menu of options.</p><p>These different settings require modelling varying choice sets <ref type="bibr">(Yamamoto 2014)</ref>. While existing discrete clustering models rule out this possibility and therefore require analysts to drop data that includes varying choice sets, I model these separately, with an independence of irrelevant alternatives (IIA) assumption to share information and parameter values across observations.</p><p>The added complication is that the vote choice probability must now be modeled as a function of data that varies by the choice set. Let Y ij denote the set of values that are available to voter i in option j. Such information would be clear from the candidate filings in that district, and so are directly observed. We then posit that the choice probability is generated from a ratio that is relative to the available choices for a given voter, as in a standard multinomial logit. Formally, we parameterize equation 5.2 as</p><p>where &#968; is a scalar that represents the intensity of preference for option &#8712; {1, 2}</p><p>relative to = 0 (abstention). To identify the MLE, for which no closed-form equation exists, I use an optimization of the likelihood with varying choice sets. The functions and derivations are provided in Appendix C.</p><p>The IIA allows us to estimate the same parameter across voters in slightly different electoral contexts, but it can be a substantial assumption to add. The multinomial probit does not explicitly make this assumption, but replaces with a distributional assumption on the errors. In general, the validity of IIA is difficult to test because each respondent's rank ordered preferences are not observed in the cases relevant here. <ref type="bibr">Yamamoto (2014)</ref> shows how one can model the varying choice sets through a choice-set specific intercept and relax this assumption. Although the method in this chapter does not conduct this modeling, there is a parallel in the approach. Yamamoto's method estimates separate effects for each choice set, whereas this method partitions voters into clusters and estimates separate choice probabilities within each cluster.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.4">Limitations and Issues of Interpretation</head><p>Before moving to empirical analyses, the implications and limitations of this methodological approach are worth some comment. While clustering algorithms are widely used in fields including computer science, marketing, and psychology, this is not the case in political science or economics. It is important to consider why.</p><p>First, the type of data this method can handle are restricted to datasets where (i) the outcome measurement is categorical and (ii) come from the roughly same choice set across offices. 2 For example, it would not be possible to analyze a set of variables that includes vote choice and numerical responses. Similarly, the model also cannot handle ballot data where some variables are partisan offices (Republican, Democrat) while others are nonpartisan offices or referendums (Yes, No), unless the analyst is willing to assume that voting "Yes" on a particular referendum represents the same underlying event as voting for a Republican or Democrat. For such cases of mixed ballots that violate (ii), an ideal point model will be more appropriate because it targets a single dimensional preference estimate and maps different votes to a single space.</p><p>For cases that violate (i), one must turn to other clustering methods like k-means for datasets with only continuous outcomes, and more involved models for a mix of continuous and categorical outcomes.</p><p>On a more important theoretical point, clustering algorithms almost always require the user to pre-determine the number of clusters K to model from the data. Therefore, one might worry that substantive findings from data may change wildly by the number of clusters. There do exist methods to pick the optimal number of clusters based on measures of model fit <ref type="bibr">(Fraley and Raftery 1998)</ref> -essentially functions of the observed likelihood attempting to account for overfitting.</p><p>But we do not need to believe that there is one "correct" number of clusters the analyst has to identify in order for clustering analyses to be useful. As <ref type="bibr">Broockman (2016, p. 207)</ref> argues, voter's preferences are likely formed by hundreds of small issue "dimensions", even though each one may not incrementally improve model fit.</p><p>Whether one models 2 clusters or 3 clusters from the data is not a claim about the analyst believing that 2 or 3 dimensions are enough to explain voting behavior. Instead, this method can be thought of as a principled way to summarize information and characterize prototypical voting patterns given the user's chosen level of granularity. Substantive theory, rather than only a statistical information criterion, should guide the choice of the number of clusters.</p><p>A related concern about unsupervised learning methods is that interpretation of each cluster is arbitrary. Examining the correlation of covariates with estimated clus-ter assignment is a useful way to uncover some interpretation. But the analyst must also bring some of their own substantive knowledge for this clustering algorithm to be useful. Indeed, model output should not be interpreted as anything more than as a summary of the data based on a simple probabilistic model of vote choice. In the same way that there is rarely a single "correct" number of clusters, it is actually reasonable to pick the number of clusters so that it reveals clusters whose estimated parameters &#181; match the theoretical quantity of interest. In my applications, the main quantities of interest are the size and voting patterns of swing voters, which the model parameters directly target.</p><p>Of course, one must start somewhere. One reasonable initial choice is K = 2, which is the simplest case and also has parallels to many theories like the black-white model or core-and-periphery. Or, one can start by setting the number of clusters to the number of response options there are. This allows the data to cluster into homogeneous response-specific clusters, if that is the underlying pattern. In the context of the core vs. swing model, one might posit that voters can be partitioned into swing voters who split their ticket regardless of the office, abstention voters who undervote regardless of the office, and so on. This is a useful null hypothesis that I test on the data, and ultimately reject.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Application to Cast Vote Records</head><p>The clustering algorithm is well suited to glean patterns from large datasets of anonymous ballots. I first illustrate the insights from the clustering approach by analyzing ballots from the Florida 2000 election, which were originally analyzed in an ideal point framework.</p><p>Both political scientists and election administrators use ballot data to understand voter behavior and ballot design, but the high-dimensional and large-N nature of these datasets makes analysis challenging.   The estimated parameters offer a straightforward summary of a vast amount of  voters exhibit different patterns of ticket splitting. Cluster 3, which is more likely to split their ticket than stick to their Senate choice, comprises 28 percent of Nader voters but only 11 percent of Bush voters and 8 percent of Gore voters. These findings are roughly consistent with the original paper's findings based on a IRT model, namely that Nader voters were not in fact predominantly "left" of Gore, and so it is not clear if Nader handed the Presidency to Bush by running. For example, Nader voters only supported the Democratic Senate candidate 60 -40.</p><p>The picture of the electorate that emerges from these analyses is one in which 60 percent were party loyalists and about 5 percent roll off, but where a quarter of the vote can be considered as a reasonable swing bloc. Further, the clustering model's pa-rameter estimates in both cluster 2 and cluster 3 help refute the null hypothesis that voter types vote straight regardless of the office. While a majority of voters were consistently straight partisans, the rest vote differently, with a particular difference between Congressional, state, and local offices.      Across all states, we find that the majority of the electorate are straight ticket voters, who are all but certain to vote for the same candidates as their party. But the proportion of these voters ranges from close to 95 percent in Minnesota to nearly 60 percent in Massachusetts. The remaining two clusters appear to contain various types of ticket splitters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Application to Survey Data</head><p>Interestingly, the pattern of ticket-splitting is state-and office-specific. Figure <ref type="figure">5</ref>.3</p><p>shows that states where the core partisan bloc is smallest are Massachusetts, Maryland, and Texas. In all three, the Governor vote stands out as in the second cluster, approaching 30 percent of the voters in Massachusetts. As the table of candidates in  The Governor-specific ticket splitting found in Massachusetts and Texas in 2018 is not found in 2014, even though the same Republican candidate was on the ballot.</p><p>In 2014, few voters were classifiable as swing, but in 2018 Baker netted more votes from Democrats and won 67 -33 while Abbott suffered from Republican defections (but pulled off re-election despite it, winning 56 -43). In Michigan, the composition of straight and split ticket voting blocs did not change significantly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5">Conclusion</head><p>In this chapter, I introduced a clustering model for election observers to draw insights from large-N , high-dimensional data. I outlined a simple model of vote choice for partisan races on the long ballot, and offered some guidance on how data analysts should interpret the model estimates of latent quantities.</p><p>In an application to the crucial vote in Palm Beach County in Florida, the clustering method found that a majority of strong partisans (although not a super-majority), about 20-30 percent of potential swing voters who split disproportionately in state and local offices, and about 8 percent of rolloff voters who still voted for their members of Congress. In the second application to survey data in 2018, the method reveals that about 80-90 percent of voters who identify with one party are straight ticket voters.</p><p>But popular Governors and some popular Senators drawing voters across the party line and effectively forming blocs that deliver their re-election.</p><p>The statistical model used here can benefit from several more additions in the future. First, one can include choice-specific covariates such as candidate ideology, candidate incumbency, and candidate gender, directly into the estimation. These coefficients are widely analyzed in multinomial logit models of consumer choice but a faster algorithm to solve such models must be derived to incorporate them into a EM algorithm for large datasets. Modeling district-specific characteristics as random effects by positing that they are drawn from a common distribution is another possible feature to add to the modeling process, although estimation of such models in multinomial regression also remains an active area of statistics research <ref type="bibr">(Linderman, Johnson, and Adams 2015)</ref>.</p><p>I have shown here that a straightforward application of a clustering model can be applied to illuminate patterns and identify groups from complex voting data. As I have discussed, these tools should not be a substitute for substantive theorizing and interpretation, but they can facilitate discovery, provide a more principled measurement of size and voting propensity, and improve theory building by providing databased guidance.</p><p>in each touchscreen to the voter, only the chosen candidate's name appears in the log. I merge the party affiliation to each name chosen and, given the purposes of this study.</p><p>One of the more difficult tasks in data processing is to determine which races were available to which voters' ballots, and whether or not the race was contested. The combination of different legislative, school, and special purpose districts leads to a proliferation of different ballot styles (i.e., a layout for which contested for a given voter). Each entry in the logs contain a precinct identifier as well as a ballot style identifier unique to each precinct. Although there are around 2,000 to 2,200 precincts in each general election, there are at least 5,000 different ballot styles.</p><p>3. To infer the layout of each style, I aggregate the individual logs from the bottomup. For each precinct and ballot style combination, I tabulate the votes cast for each candidate. When working with contests for offices that held elections for only a subset of voters, I denote that this office did not exist for a given precinct -ballot style if no voter in that set cast a vote for the office. This way, I distinguish abstentions from the lack of existence of the contest.</p><p>One side-effect of this procedure is that absentee votes are not counted, because the voting machine codes them with a virtual precinct at the county-level, thereby effectively erasing information about the precinct of the absentee ballot. Until 2018, South Carolina voters had to be over 65 or have an "excuse" for not be able to vote on election day to apply for an absentee ballot. In the five general elections studied here, 17.7 percent of the 8.4 million ballots cast were absentee ballots.</p><p>4. I then aggregate votes at the district level, and declare a district as contested if votes for both the Republican and Democrat exist.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1.2 Data on Newspaper Mentions</head><p>For a measure of name familiarity I compute the number of state newspaper articles that mention the candidate's name.</p><p>&#8226; I search the 86 newspapers in South Carolina available in NewsLibrary.com.</p><p>&#8226; I used the length of the office's term ending the day before the election. For example, for U.S. Senate candidates running against each other in the November 6, 2018 election, I search the dates November 5, 2012 to November 5, 2018, and for U.S. House candidates I would use a two-year timeframe. I do not include election day to prevent biasing counts towards the eventual winner.</p><p>&#8226; I search the official name on the ballot. In case of middle name or first name initials, I also include a version that removes the initial. For example, for "Nikki R Haley", I search for the term ("Nikki R Haley") OR ("Nikki Haley").</p><p>&#8226; I generally do not restrict to specific election-related or office-related terms, with the following exceptions: County Council members, Sheriffs, and Probate Judge searches are further restricted by the county of the office. This measure, then, aims to captures general name recognition with some filters added to prevent miscounting common names (like "David Smith") as mentions of candidates.     </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1.4 Summary Statistics for the Valence Advantage</head><p>This subsection formalizes how the measure of the valence advantage is constructed.</p><p>Incumbency is a binary variable, taking 1 if the Republican candidate is an incumbent and 0 otherwise. This variable is only used in contested races. Taking the ratio is appropriate because the relevant comparison is between two candidates competing against each other. Taking the log reduces the impact of outliers and allows for both a ratio and difference interpretation. Adding one to each value before taking the log prevents the few candidates that have zero news article hits or report no campaign contributions from causing divide-by-zero errors. Note: Tables show mean, 10th percentile, median, 90th percentile, standard deviation, and sample size for the respective measure by office. Each observation is measured at the contest level. Exponentiated versions approximate the quantity in their original ratio form.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2.2 Elections and Offices Covered</head><p>Table <ref type="table">A</ref>.3 shows the extent of the iVotronic data examined in this article. I show the offices up for election, the number of contested races, and the number of precincts and voters for each general election year. Note: Numbers are from data after pre-processing, detailed in Appendix A.2. The number of voters in each election are shown in the last row, thousands. Two counties from 2010 and one county from 2012 is missing from the files released by the state election commission.</p><p>incumbency advantage, and Subsection A.3.5 shows estimates of overtime change in straight ticket voting. Note: Each proportion shows the fraction of voters who voted for the same party for all contested races on their ballot, with number of voters (n) counted in 1000s.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3.1 Distribution of Party Splits</head><p>Each histogram shows the distribution of a person's vote for a favored party as a fraction of the contested races on their ballot. In all graphs, axes range from 0 to 100 percent; therefore the height of the rightmost bar corresponds to the "Straight" proportion.    <ref type="table">A</ref> <ref type="bibr">.5. Table A.6</ref> conducts similar regressions as that of Table A.5 but only among those who did not use the party lever.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3.2 Split Ticket</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3.5 Overtime Change in Party Loyalty</head><p>The ballot image data is relatively limited in scope for fully test whether elections are nationalizing, because only certain elections occur simultaneously in the same election. Moreover, changes in candidates as well as electorates across elections make it difficult to attribute changes in district level partisan voting to changes in any individual voters' preferences. I therefore examine the same-party voting rates between the U.S. House and State House, which are up for election every general election.  Note: Reference category is the race for US Senate, where incumbent Ben Cardin won with 65 percent of the vote. "Democrat" and "Republican" in the headers are shorthand for the party vote in the reference category.</p><p>Palm Beach County Florida, 2000 Figure <ref type="figure">5</ref>.1 shows the cluster analysis results of voting patterns in the 2000 General Elections in the state of Florida, using only contested statewide or countywide races. Though from an earlier time period than the one studied in this paper, this electorate also saw higher levels of ticket splitting in state and local offices.</p><p>10,000 100,000 1,000,000 3,000,000 10,000 100,000 1,000,000 3,000,000 10,000 100,000 1,000,000 3,000,000 10,000 100,000 1,000,000 3,000,000 We represent this unknown quantity as</p><p>Then the E-step can be the normalized version of the posterior probability marginalized by the mixing proportion,</p><p>3) E-step For each voter i, compute the probability that they belong in cluster k: We iterate through these two steps until convergence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2.2 Evaluating Convergence</head><p>We evaluate convergence by the observed log likelihood,</p><p>So the observed log-likelihood is</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2.3 Speed-Up by Collapsing to Unique Profiles</head><p>Because this EM algorithm deals with discrete data, the algorithm needs only sufficient statistics. In our setting the unique number of voting profiles is much smaller than the number of observations, because vote vectors follow a systematic pattern and most votes are straight-ticket votes. Therefore, we can re-format the dataset so that each row is a unique combination.</p><p>Let u &#8712; {1, ..., U} index the unique voting profiles, and n u be the number of such profiles in the data. We re-cycle the objects Y and &#950; so that each row indexes profiles rather than voters.</p><p>We repeat the EM algorithm described earlier. In other words, for each k, j, we estimate intercepts from regressing a vector of categorical votes for office Y j , using the estimates of &#950; k as the weight zeta _ k. R packages of multinomial logit typically presume IIA if an outcome value is missing and implicitly do the kind of three-way subsetting as in equation 5.5.</p><p>We can also solve the mlogit with varying choice sets by coding the MLE directly.</p><p>In this paper, I opt for this option because it is considerably faster than using a builtin multinomial package.</p><p>To formalize this, I introduce new notation m ij &#8712; {0, 1}, for whether option is available for individual i in office j. Clearly, therefore, m ij is a direct a mapping from</p><p>to &#968; jk1 and &#968; jk2 :</p><p>It is easier to consider the gradient at i, because the rest will be the sum of the individual gradients.</p><p>So generally, the + 1th gradient is</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>For usage in the popular press, see FiveThirtyEight, "Split-Ticket Voting Hit a New Low in</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2018" xml:id="foot_1"><p>Senate and Governor Races"(November 19, 2018). https://perma.cc/9Y75-3J9R.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>Joe Lieberman (CT), Bernie Sanders (VT), Angus King (ME), Harry Byrd Jr. (VA), Wayne Morse (OR), are coded as Democrats, and James Buckley (NY) as a Republican.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_3"><p>To compute this number, Ghitza uses the full voter file maintained by Catalist and impute the vote choice or vote choice of each registrant in</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2016" xml:id="foot_4"><p>(President)  and in 2018 (House). Access to the voter file allows them to subset the population into three types: Presidential drop-off voters (who vote in 2016 but stay home in 2018), the midterm surge (those who do not vote in 2016 but voted in 2018), and 2016-2018 voters.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_5"><p>* I thank Duncan Buell, Jeff Lewis, Stephen Pettigrew, and Charles Stewart for their expertise in cast vote records; Carolyn Abott, Yuki Atsusaka, Peter Buisseret, Justin de Benedictis Kessner, Barry Burden, Hanno Hilbig, David Kimball, Edward Lawson, Jr., Daniel Moskowitz, Socorro Puy, Andrew Stone, Cl&#233;mence Tricaud, Chris Warshaw, Soichiro Yamauchi, Hye Young You, Michael Zoorob, members of the Imai Research Group, members of the Democracy Policy Lab at Stanford for their comments; and to Steve Ansolabehere, Matt Blackwell, Kosuke Imai, and Jim Snyder for their guidance.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_6"><p>The literature of ticket splitting based on these elections is extensive. In American Politics, see<ref type="bibr">Campbell and Miller (1957)</ref>,<ref type="bibr">Beck et al. (1992)</ref>, and<ref type="bibr">Burden and Kimball (2002)</ref>. In Comparative Politics, see<ref type="bibr">Burden and Helmke (2009)</ref> and references therein.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_7"><p>The exceptions, as of 2018, are California, Georgia, and Massachusetts, based on the samples acquired by Ballotpedia. https://perma.cc/8ADA-B5YT.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_8"><p>Testing the moderation hypothesis is also complicated by the fact that candidate positioning is likely endogenous to their valence advantage<ref type="bibr">(Groseclose 2007)</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_9"><p>Cast vote records are also referred to as ballot image logs.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_10"><p>This rules out analyses of racial voting, an important feature of politics in South Carolina, at least at the individual level. Votes by the same person across separate elections are not linkable either.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_11"><p>According to legislator ideology estimates from<ref type="bibr">Shor and McCarty (2011</ref>), during 1996 to 2009, the spatial gap between the median Democrat and median Republican in the South Carolina State House was about as large of the spatial gap between members of Congress. Updated data from<ref type="bibr">Shor (2018)</ref> during 2010-2016 shows that the gap in South Carolina has grown about 20 percent.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_12"><p>States have gradually discontinued the party lever. In 2018, only Alabama, Indiana (except for at-large races), Kentucky, Oklahoma, Pennsylvania, South Carolina, Texas (until 2019), and Utah used the party lever.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_13"><p>Ideal points have also been used to analyze voting matrices likely this, but it imposes a spatial model of vote choice that may be may be less appropriate for voters' preferences than it is for legislators' rollcall votes<ref type="bibr">(Broockman 2016)</ref>. Moreover, ideal point methods often use hundreds of votes and lack convergence properties with fewer votes, which is the setting here.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_14"><p>This rate is smaller compared to those reported in other studies of roll-off, which show roll-off to be about 5 -10 percent. However, most of these other studies examine non-partisan elections or ballot measures. Additionally, the values in Figure3.2 take voters who have already voted for a major party at the top of the ticket as its denominator, and in South Carolina the presence of the party lever likely decreases roll-off in partisan contests.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_15"><p>See https://perma.cc/5P6U-REC9 for a list of publications that use the CCES.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_16"><p>* I thank Kosuke Imai and Soichiro Yamauchi for their guidance and help on this chapter. I also thank Marc Meredith for heplful comments.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_17"><p>An exception is a working paper by<ref type="bibr">Dubin and Gerber (1992)</ref>, who analyze ballot propositions, and<ref type="bibr">Hill and Kriesi (2001)</ref> who apply the finite mixture model to longitudinal public opinion data to test Converse's black-and-white model.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_18"><p>While I allow for varying choice sets, a better name for this commonly used term is a "limited choice set," i.e. one of the options being missing.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_19"><p>In this county, none of the offices I mention were uncontested. I analyze the case of uncontested House races in the next application.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_20"><p>Among Gore voters, I flip the numbering between cluster 3 and 4 so that each clustering's voting patterns are similar to their counterparts in the other panels. This violates the rule for numbering clusters by estimated size, but here the sizes are similar enough that I opt for the gains in comparability.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_21"><p>Therefore, one note of implication here is that now the voters in each panel are a mix of Nelson and McColumn voters.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_22"><p>Estimating four clusters recovered similar findings for the swing voter bloc, but was sensitive due to much smaller samples in certain states compared to the ballot data example.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_23"><p>"America's Most and Least Popular Governors". Morning Consult Poll, July 25, 2018. https: //perma.cc/2XYN-NJZ7</p></note>
		</body>
		</text>
</TEI>
