<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Speaking their language?: Multilingualism in party communication across democracies</title></titleStmt>
			<publicationStmt>
				<publisher>Wiley</publisher>
				<date>04/15/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10636364</idno>
					<idno type="doi">10.1111/ajps.12976</idno>
					<title level='j'>American Journal of Political Science</title>
<idno>0092-5853</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Taishi Muraoka</author><author>Dahjin Kim</author><author>Christopher Lucas</author><author>Jacob Montgomery</author><author>Margit Tavits</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>Which parties embrace multilingualism in their communication? Despite growing interest in parties’ multilingualism among normative scholars of deliberative democracy, empirical research has largely overlooked the linguistic aspect of party competition. We leverage large‐scale data on Facebook posts by more than 800 parties in 87 democracies and analyze their day‐to‐day language practices. By so doing, we develop, for the first time, the classification of monolingual and multilingual parties around the world. Moreover, using this novel dataset, we explore what factors are associated with parties’ adoption of multilingualism and how multilingual parties predict the language use of candidates they nominate. Overall, this study provides the most comprehensive picture of parties’ multilingualism in contemporary democracies and sets agendas for future research in the intersection of parties and language representation.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Linguistic diversity poses unique challenges to democracy. Major language differences among citizens make it more difficult for them to enter into meaningful political dialogues and develop a common public sphere. The failure to create a shared deliberative forum results in limited opportunities for different language groups to exchange their viewpoints and engage in rational persuasion. Eventually, this circumscribed deliberative process can have downstream consequences on economic growth, public goods provision, and even civil conflict <ref type="bibr">(Desmet, Ortu&#241;o-Ort&#237;n and Wacziarg 2012;</ref><ref type="bibr">Liu and Pizzi 2018)</ref>.</p><p>Responding to the deliberative challenges in linguistically diverse democracies, several theorists claim that multilingual parties can be important intermediaries of deliberation among different language groups <ref type="bibr">(Bonotti and Stojanovi&#263; 2022;</ref><ref type="bibr">Stojanovi&#263; and Bonotti 2020</ref>). 1 By operating in multiple languages, these parties create a public sphere in which citizens from different language groups are informed about each other's perspectives, engage in constructive debates, and pursue political projects that achieve shared goals. By so doing, multilingual parties can uphold more inclusive and respectful democratic practices than monolingual parties that represent a single language group and reinforce social cleavages.</p><p>Despite the normative importance of multilingual parties, there is very little empirical understanding of parties' multilingualism in democracies around the world. Indeed, prior studies have looked into only one or a few countries to understand the language practices of political parties <ref type="bibr">(Caluwaerts and Reuchamps 2014;</ref><ref type="bibr">De Bres, Rivera Cosme and Remesch 2020;</ref><ref type="bibr">Rubin 2014</ref>).<ref type="foot">foot_0</ref> Thus, <ref type="bibr">Bonotti and Stojanovi&#263; (2022)</ref> deplore "[i]n our 1 In the SI Section A (pp. 1-2), we discuss how parties' multilingualism is different from other related concepts, such as multiculturalism <ref type="bibr">(Westlake 2018)</ref>, cross-ethnic appeals <ref type="bibr">(Devasher and Gadjanova 2021;</ref><ref type="bibr">Gadjanova 2021)</ref>, and ethnic mobilization <ref type="bibr">(Chandra 2007;</ref><ref type="bibr">Strijbis and Kotnarowski 2015)</ref>. survey of the literature on multilingualism and political parties, <ref type="bibr">[...]</ref> we were also struck by the relative lack of empirical research on the linguistic dimensions of party life" (p. 480). As a result, we do not have answers to even the most fundamental questions about parties' multilingualism. Which parties use multilingual appeals? What explains their adoption of multilingualism? And how do multilingual parties relate to the campaign communications of candidates they recruit? Answering these questions has important implications for how deliberative processes work in different democracies.</p><p>We argue that the limited attention to the linguistic dimension of party competition stems from the lack of appropriate methodological tools. Thus, the primary goal of this article is to resolve this issue by analyzing a novel dataset of parties' communications on Facebook, a widely-used platform for parties in much of the world. As social media posts represent parties' direct attempts to communicate with citizens on a daily basis, analyzing these posts enables us to have a natural assessment of how parties choose to communicate in different languages. Methodologically, we draw on recent advances in computational models for language detection to generate a dataset on parties' multilingualism in 87 countries. Crucially, our data encompasses not only well-studied multilingual countries, like Canada and Switzerland, but also less-studied ones, such as Malaysia and Lesotho.</p><p>After establishing the monolingual/multilingual classification of more than 800 parties, we analyze what factors are associated with their adoption of multilingualism. In answering this question, we situate our argument within the incentives-constraints model of seat-maximizing parties (Abou-Chadi, Green-Pedersen and Mortensen 2020; <ref type="bibr">Tavits and Potter 2015;</ref><ref type="bibr">Toubeau and Wagner 2016)</ref>. This theoretical framework suggests that parties should have greater incentives to use multilingual appeals as linguistic diversity increases. However, whether this strategy is feasible is conditioned by two types of constraints: (1) institutional constraints, which shape parties' calculus regarding whether they need to appeal across language boundaries and build a broad coalition to <ref type="bibr">Millian (2018)</ref> and <ref type="bibr">Ringe (2022)</ref>. gain a seat <ref type="bibr">(Calvo and Hellwig 2011;</ref><ref type="bibr">Cox 1990;</ref><ref type="bibr">Horowitz 1985)</ref>; and (2) ideological constraints, which shape parties' expectations about whether multilingual appeals would reinforce or undermine their ideological brands <ref type="bibr">(Adams and Somer-Topcu 2009;</ref><ref type="bibr">Lupu 2014;</ref><ref type="bibr">Tavits 2007)</ref>.</p><p>Our theory of institutional constraints suggests that the association of linguistic diversity with parties' adoption of multilingualism is greater under majoritarian electoral systems than under proportional ones. This occurs because majoritarian systems necessitate parties to build a broad winning coalition to gain a seat in the district, while proportional systems assure a more proportional translation of votes into seats. This means that as linguistic diversity increases, the extent to which minority language groups become decisive in determining who gains a seat increases at a higher rate under majoritarian systems than proportional systems. As a result, when linguistic diversity is high, the former requires parties to cross language lines and appeal to different language groups more than the latter. Our empirical results generally support these expectations, showing modest evidence that increases in linguistic diversity are more strongly associated with parties' adoption of multilingualism under majoritarian systems than under proportional systems.</p><p>Our argument about the ideological constraints, in turn, suggests that the association between linguistic diversity and parties' adoption of multilingualism should be greater among socially left-leaning parties. This is because these parties do not face any branding problems by accepting multilingualism, as committing to minority inclusion is consistent with their existing ideological brands. By contrast, it is more difficult for socially rightist parties to use multilingual appeals because their supporters may perceive such appeals as a betrayal of the party brand. In line with these expectations, we demonstrate that, as linguistic diversity increases, socially left-leaning parties are more likely to adopt multilingual appeals than their right-leaning counterparts. Moreover, a simple placebo test shows that this pattern does not hold as clearly when focusing on parties' economic left-right ideology.</p><p>We extend our analysis to individual candidates and examine how multilingual parties relate to candidate-level language practices. To do so, we analyze the Facebook posts of candidates for lower house elections in twelve multilingual countries that took place between 2020 and 2023. We show that candidates of multilingual parties are more likely to embrace multilingualism and use a minority language than candidates nominated by monolingual parties. These findings indicate that parties' multilingualism reflects not their symbolic gestures to appeal to different language groups but rather their deeper commitment to enhancing language-based representation.</p><p>In total, this study offers the most comprehensive picture of parties' multilingualism in contemporary democracies -how it manifests within party systems and the institutional and ideological factors shaping its adoption. In addition, we advance and validate a novel approach to identifying language use in large collections of political texts. By so doing, we provide a valuable new measure of parties' multilingualism in democracies across the globe, which opens up new avenues of future research in the intersection of language and politics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Language, deliberation, and multilingual parties</head><p>Like race, ethnicity, and gender, language is a core component of social identity that defines political cleavages <ref type="bibr">(Barrington 2022;</ref><ref type="bibr">Laitin 1989;</ref><ref type="bibr">Marquardt 2022;</ref><ref type="bibr">Medeiros 2017)</ref>.</p><p>Treating language as a part of identity, many empirical studies have examined how language shapes political preferences and behavior <ref type="bibr">(Frye 2015;</ref><ref type="bibr">Lee and P&#233;rez 2014;</ref><ref type="bibr">Ricks 2020;</ref><ref type="bibr">P&#233;rez and Tavits 2019;</ref><ref type="bibr">Z&#225;rate, Quezada-Llianes and Armenta 2024)</ref>. 3 In some countries, such as Belgium and Spain, linguistic divides overlap with other social 3 P&#233;rez and Tavits (2022) show that language is important to understand other, nonidentity based political attitudes, including gender equality, environmental protection, and policy priorities.</p><p>divisions (e.g., ethnicity). In these cases, language is seen as a primary factor that reinforces cultural differences and political cleavages between groups <ref type="bibr">(Kulyk 2011;</ref><ref type="bibr">Laitin 1989)</ref>. In other countries, such as India, Lesotho, and Malaysia, the use of a certain language (e.g., English) offers socioeconomic and other advantages. In these cases, language introduces cross-cutting social divisions and identities, generating a mosaic of groupings that together map onto political loyalties.</p><p>However, the political importance of language goes beyond its role as an identity marker. In recent decades, normative theorists have focused on the instrumental role of language to channel political deliberation <ref type="bibr">(Lacey 2017;</ref><ref type="bibr">Schmidt 2014;</ref><ref type="bibr">Strani 2020;</ref><ref type="bibr">Young 1990</ref>). Their central debate has been whether linguistic diversity hinders effective deliberation in the public sphere. For some scholars, the presence of multiple languages in a political system constitutes an obstacle to the formation of a common public sphere <ref type="bibr">(Addis 2007;</ref><ref type="bibr">Lacey 2017)</ref>. After all, language differences encourage different language groups to engage in political debates within their group boundaries, which reduces the chance that different parts of the system enter into a meaningful dialogue. According to this view, having a common language is a prerequisite for functioning deliberative democracy. 4   By contrast, other theorists advocate multilingualism in democratic deliberation. For example, <ref type="bibr">Strani (2020)</ref> defends multilingualism on the grounds that monolingual public spheres are exclusionary. Allowing only one language in public debates means creating language hierarchies, and this inevitably results in marginalizing minority languages.</p><p>Similarly, <ref type="bibr">Schmidt (2014)</ref> supports multilingual practices in public deliberation because they can promote more inclusive and egalitarian participation among all citizens. 5 For 4 Consistent with this claim, some studies find that a lingua franca has a positive effect on intergroup relationships and economic development <ref type="bibr">(Kumove 2022;</ref><ref type="bibr">Liu 2015;</ref><ref type="bibr">Liu and Pizzi 2018)</ref>.</p><p>5 Specifically, <ref type="bibr">Schmidt (2014)</ref> points out three advantages of multilingual public spheres:</p><p>(1) the engagement of different language communities results in the most legitimate form him, forcing citizens to use a common language is to require some of them to change who they are. Allowing multilingual interactions in public spheres should achieve more legitimate deliberation outcomes.</p><p>In modern democracies, political parties play critical roles in structuring public deliberation, and multilingual parties become particularly vital to creating multilingual public spheres <ref type="bibr">(Bonotti and Stojanovi&#263; 2022;</ref><ref type="bibr">Stojanovi&#263; and Bonotti 2020</ref>).<ref type="foot">foot_1</ref> For one, as primary vehicles of representation, parties shape which language enters the process of political deliberation. Simultaneously engaging with different language groups, multilingual parties ensure that political debates take place beyond the boundaries set by the linguistic border. Second, parties also become important informational and educational sources for voters. By providing the same information in different languages, multilingual parties enhance equal access to information, provide opportunities to learn about different viewpoints, and enable constructive dialogues across language groups.</p><p>The importance of multilingual parties in sustaining meaningful democratic deliberation in linguistically diverse settings is nicely illustrated by the Belgian party system. Belgium's institutional configurations enable parties to win votes by appealing solely to one language group. As <ref type="bibr">Caluwaerts and Reuchamps (2014)</ref> describe, this reduces their incentives to communicate to citizens in the other language group and erodes lines of communication across language lines. The separation of the party system by language eventually distorts the deliberative capacities of the entire system, leading to political crises characterized by tense communal relations and political instability. In this way, the Belgian case illustrates how the presence of mostly monolingual parties, of governance (legitimation advantage); (2) the inclusion of all language groups in the deliberation table leads to truly "common" decisions (common good advantage); and (3) the deliberation among different language groups enhances human capacity by requiring us to see the world from others' points of view (human flourishing advantage).</p><p>each representing a single language group, may not be sufficient to construct a strong public sphere in multilingual democracies.</p><p>The above discussions suggest that there are extensive normative debates about the relationship between linguistic diversity and deliberation and how multilingual parties play a part in this process. However, empirical investigation of which parties actually adopt multilingualism is limited to a handful of case studies that have analyzed parties' language use based on anecdotes or qualitative reading of parties' election programs <ref type="bibr">(Caluwaerts and Reuchamps 2014;</ref><ref type="bibr">De Bres et al. 2020;</ref><ref type="bibr">Rubin 2014;</ref><ref type="bibr">Stojanovi&#263; and Bonotti 2020)</ref>. This gap between normative theories and empirical research is striking and requires a more systematic assessment of parties' multilingual practices. To do so, we develop a method of detecting multilingual parties by applying computational methods of language detection to parties' social media posts.</p><p>Before proceeding, we reiterate that the goal of this study is not to resolve normative debates surrounding parties' multilingualism. For example, there are long-lasting debates about what kind of party system is more desirable for the stability of the state: having parties that cross-cut group boundaries and use more accommodating policies/appeals <ref type="bibr">(Horowitz 1985;</ref><ref type="bibr">Reilly 2002)</ref> or having different parties that represent distinct language groups and adopt a strategy that indirectly encourages linguistic fractionalization <ref type="bibr">(Laitin 1998;</ref><ref type="bibr">Lijphart 1977)</ref>. Or, one might ask whether or when linguistic representation is normatively desirable, and why language-based representation in party communication matters beyond the descriptive and substantive representation of different language groups in parliament. Rather than arbitrate these claims here, our goal is to present some initial empirical patterns about what makes multilingualism in party communication more or less likely, and to provide a novel empirical tool that enables future studies to further explore language-based representation and its consequences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>When do parties use multilingual communication?</head><p>Once we know which parties use multilingualism, the next question becomes what encourages them to adopt this strategy in the first place. Addressing this question is critical for two reasons. First, it allows us to understand under what conditions multilingual public spheres are likely to emerge. Second, it also sheds new light on broader debates about the relationship between social cleavages and party system <ref type="bibr">(Duverger 1954;</ref><ref type="bibr">Lipset and Rokkan 1967)</ref>.</p><p>We draw on a basic incentives-constraints theory of party strategy, which starts with the recognition that parties are seat maximizers and will follow an electoral strategy that will help them gain more seats. Most fundamentally, the literature assumes that social conditions define salient issues in the party system, which, in turn, influence what strategy is electorally necessary <ref type="bibr">(Abou-Chadi et al. 2020;</ref><ref type="bibr">Tavits and Potter 2015;</ref><ref type="bibr">Toubeau and Wagner 2016)</ref>. In the current context, this means that as linguistic diversity increases, parties should (on average) perceive greater needs to appeal to different language groups by embracing multilingualism. However, how parties respond to underlying social structures is not likely to be uniform both across and within party systems because two factors constrain what is electorally viable for them. First, institutional constraints influence parties' calculus by mechanically determining how votes are translated into seats <ref type="bibr">(Calvo and Hellwig 2011;</ref><ref type="bibr">Cox 1990</ref>). Second, ideological constraints limit the range of actions parties can take without undermining their ideological brands <ref type="bibr">(Adams and Somer-Topcu 2009;</ref><ref type="bibr">Lupu 2014;</ref><ref type="bibr">Tavits 2007</ref>). As we explain in greater detail below, seatmaximizing parties respond to increasing linguistic diversity and adopt multilingualism only when, within these two constraints, clear incentives exist to do so.</p><p>To begin, the institutional constraints imposed by electoral systems -how votes are translated into seats -affect the extent to which parties need to make cross-cutting multilingual appeals to win a seat in a district <ref type="bibr">(Horowitz 1985;</ref><ref type="bibr">Reilly 2002)</ref>. Here, we focus on the distinction between two broad categories of electoral systems: majoritarian systems, which require parties to obtain a majority of votes to gain a seat in the district, and proportional systems, which translate votes into seats in a more proportional manner.</p><p>The two electoral systems have differential effects on how minority language groups' size is translated into their electoral strength in each district <ref type="bibr">(Crisp et al. 2018;</ref><ref type="bibr">Huber 2012)</ref>. Under proportional systems, the electoral strength of language minorities grows proportional to their size. The growing electoral strength of minority groups should, in turn, proportionally increase parties' willingness to use multilingual appeals.</p><p>Under majoritarian systems, however, the relationship between group size and electoral strength becomes more complex. When the size of minority language groups is very small, parties can build a majority coalition and win a seat by ignoring these groups.</p><p>By contrast, when their size becomes sufficiently large, it is necessary to accommodate minority voters to build a winning coalition. This means that as their group size increases, the extent to which minority language groups' votes become pivotal grows at a much higher rate under majoritarian systems than under proportional systems <ref type="bibr">(Westlake 2018)</ref>.</p><p>Consequently, as linguistic diversity increases, parties' incentives to use cross-cutting appeals should increase more steeply under the former than the latter (Horowitz 1985;   Reilly 2002). 7 In sum, the constraints induced by electoral systems generate the following expectation:</p><p>Hypothesis 1 Increases in linguistic diversity are more strongly associated with parties' adoption of multilingualism under majoritarian systems than under proportional systems.</p><p>Turning to the ideological constraints, parties' existing ideological brands restrict 7 Electoral systems also influence the likelihood that minority language parties emerge <ref type="bibr">(Bochsler 2010)</ref>. It is easier to form a minority party under proportional systems than under majoritarian systems <ref type="bibr">(Lijphart 1977;</ref><ref type="bibr">Norris 2008)</ref>. The relative absence of minoritybased parties under majoritarian systems gives additional incentives for other parties to use multilingual appeals and mobilize the untapped votes of language minorities. whether they can use multilingual appeals to maximize their seats. Parties' choices of different appeals are electorally rewarding only if these appeals are consistent with the ideological images that parties have built up among their supporters and the electorate more generally <ref type="bibr">(Adams and Somer-Topcu 2009;</ref><ref type="bibr">Lupu 2014)</ref>. When voters perceive that parties deviate from their core brands, they may feel betrayed and punish parties.</p><p>According to <ref type="bibr">Tavits (2007)</ref>, this is especially the case in the domain of principled issues, which are related to voters' core values, beliefs, and group identity.</p><p>We expect that the use of multilingualism would be detrimental to the brand maintenance of socially right-leaning parties, particularly those on the extreme right.</p><p>Since their supporters tend to hold more negative views toward minority groups and cultural diversity <ref type="bibr">(Golder 2016;</ref><ref type="bibr">Inglehart and Norris 2016)</ref>, multilingual appeals could lead to a backlash among the core supporters of the socially rightist parties. In line with this argument, <ref type="bibr">Flores and Coppock (2018)</ref> show that Spanish-language advertisements reduce candidates' electoral support among English-speaking monolingual Americans. Such backlash may be particularly likely when using minority languages on social media because, on these platforms, parties cannot choose their audience. Furthermore, prior work shows that, in anticipation of possible backlash, right-leaning parties tend to place ethnic minority candidates in lower, less visible list positions (Van der Zwan, Lubbers and Eisinga 2019). These same considerations likely incentivize socially right-leaning parties to refrain from using different languages.</p><p>In contrast, socially left-leaning parties are less likely to face the same ideological constraints. This is because they are known for their commitment to social justice and promotion of multiculturalism <ref type="bibr">(Ireland 2004;</ref><ref type="bibr">Westlake 2018)</ref>, and therefore communicating in different languages is unlikely to be seen as a violation of their core principles. Indeed, their supporters may even welcome parties' adoption of multilingualism as a positive signal that reinforces their ideological commitment to cultural diversity. 8 In short, the ideological constraints that parties face lead to the following expectation: Hypothesis 2 Increases in linguistic diversity are more strongly associated with parties' adoption of multilingualism for socially left-leaning parties than for the socially right-leaning ones.</p><p>Note that this hypothesis concerns parties' left-right ideologies on the social dimension, and not on the economic one. We therefore expect that parties' economic ideologies do not predict their adoption of multilingual appeals as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Measuring parties' multilingualism using social media data</head><p>To examine language use by parties (and candidates), we analyze messages on their official Facebook accounts. Social media data provides arguably the most appropriate tool to analyze multilingual practices for two reasons. First, it provides richer text data on day-to-day elite communication than any other source because most parties are active on social media on a near daily basis. Other text data that parties produce is ill-suited to understand their everyday language choice. For example, party websites are a more static representation of communication. 9 Similarly, campaign manifestos are issued only when 8 Therefore, it is reasonable to expect that multilingual parties employ inclusive, rather than divisive tones in their communications. Analyzing the content of multilingual communication is beyond the scope of this research, but remains an important task for future studies.</p><p>9 For a subset of parties in our data, we compare their multilingual practices on social media (de facto multilingualism) and websites (pro forma multilingualism). We find that the two types of multilingualism are positively correlated with r = 0.30. However, we also observe that pro forma multilingualism is more prevalent as many parties have a there is an election. Reflecting parties' day-to-day activities, the sheer volume of text data on social media becomes incomparable to that of website or manifesto data, allowing us to have a much more accurate understanding of how parties balance communication in different languages.</p><p>Second, social media data is easier to collect than other documents that parties produce. For example, the Comparative Manifesto Project (CMP; <ref type="bibr">Volkens et al. 2020)</ref> does not collect party manifestos written in second languages (if any), which makes it impossible to perform the kind of analysis we present here. Other sources of policy statements, such as news clippings, are scattered, making it hard to grasp the whole picture of parties' language use <ref type="bibr">(Gadjanova 2021)</ref>. By contrast, we can easily download and analyze the complete data on parties' messages on social media (assuming that we locate the relevant accounts of these parties in the first place).</p><p>In the remainder of this section, we illustrate how to use language detection tools to analyze multilingualism using parties' social media data. We first describe the data we use. Second, we explain the ideas behind the computational methods of language detection. Third, we apply these methods to label languages of individual posts on parties' Facebook pages. Finally, we develop our measure of monolingual and multilingual parties based on these post-level classifications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Party-level social media data</head><p>We focus on parties and electoral coalitions in 87 countries. These countries were selected because they met at least one of two conditions: (1) a democratic country<ref type="foot">foot_2</ref> included in small web-section in English. This implies that relying on party websites could overestimate the extent to which they use different languages in daily interactions with voters.</p><p>the CMP <ref type="bibr">(Volkens et al. 2020)</ref> or (2) a democratic country with a population of more than 1 million and greater than 20% Facebook penetration (Internet World Stats 2021). As a consequence of this coding rule, our dataset encompasses a diverse set of democracies from different regions, with less than one-third being advanced Western democracies. 11</p><p>We collected the public Facebook pages of parties and electoral coalitions that received at least 3% of the popular vote or 1% of the seats in lower house elections that happened after 2016. 12 To identify the correct Facebook pages, we first checked parties' websites and obtained links to their Facebook pages. If Facebook accounts were not linked on their websites, we used search on Google and Facebook based on party names.</p><p>Once we found a page, we confirmed it was a valid one by checking page description, page history, post content, and user engagements.</p><p>While some parties use multiple languages on a single page, other parties set up different pages by language. 13 The former practice is the norm in Canada, except for the and consider a country as a democracy if its mean score is above 5.5. We supplement these cases with countries coded as electoral democracies according to Freedom House (2022) in 2022.</p><p>11 The data includes 11 countries from Asia, 17 from Latin America, 4 from North Africa/the Middle East, 6 from Sub-Saharan Africa, 2 from the Caribbean, 23 from Eastern Europe, and 24 from Western Europe/North America.</p><p>12 Our data collection started in 2020. For the party-level data, we also collected historical data since 2016. This was more difficult and highly labor-intensive to do for the candidate-level data, which we detail below. As a result, these data begins in 2020. 13 We do not distinguish between the two types of multilingual practice as parties may switch back and forth between these types. For example, as of 2020, the Green Party of Canada had separate Facebook pages in English and French. But, as of 2021, the two Quebec Bloc, while the latter is common in countries like Estonia, Israel, and Switzerland.</p><p>In many cases, parties' websites had links to Facebook pages in different languages.</p><p>However, we also conducted generic searches on Google and Facebook using party names in the country's official languages and all languages that were used in the websites of parties in the same country. 14 In total, we identified the official Facebook pages of more than 900 parties (93% of the parties and coalitions on our original target list).</p><p>In this study, we analyze parties' Facebook posts from 2016 to 2022, which were downloaded through Facebook's CrowdTangle API (CrowdTangle Team 2022). After excluding parties that had less than 50 posts, our dataset consists of 843 parties. In total, we analyze around 4 million posts, which together received more than 280 million user engagements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Computational language detection at scale</head><p>Identifying the language in which texts are written poses several challenges. First, at a conceptual level, there is no universally acceptable way to define what constitutes a distinct "language" relative to a "dialect." All scholars may agree that Chinese and Spanish are distinct languages, but the lines of demarcation are often more subtle, and linguistic researchers do not always reach a consensus. Difficult examples include Croatian and Bosnian, Indonesian and Malay, and Scottish and Irish Gaelic. 15 Second, even given common definitions, classifying the language of any particular piece of text is pages were merged into one.</p><p>14 Some parties in non-English speaking countries (e.g., Lebanon and the Netherlands) establish Facebook pages in English. This is partly explained by the growing number of immigrants and diaspora voters (see, e.g., DutchNews 2022).</p><p>15 For this reason, there is also no consensus on how many languages are currently spoken around the world or even in many countries. not always straightforward. Only a handful of languages can be determined strictly by the alphabet, and the rest must be determined by words themselves. An algorithm could attempt to identify language based on the words used in a document. However, this would require training a model on a large number of words in every language in the world. Moreover, determining what constitutes a "word" is sometimes difficult without first knowing the language in which the text is written, as many languages (e.g., Chinese) do not separate words with spaces. To address these challenges, we use an ensemble of language detection algorithms. These algorithms build on interdisciplinary academic and industry research (e.g., Google) and are trained on massive amounts of data. The seven algorithms we use in this study are summarized in Table <ref type="table">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>[Table 1 about here]</head><p>These algorithms proceed by preprocessing the text into substrings ("n-grams"), which are n-character strings from which the original text can be constructed. For example, the text string "referee" would be processed into 're', 'ef', 'fe', and 'ee' if n = 2. 16   Representing text as n-grams has the useful property of not requiring any ex ante knowledge about the language; Chinese and English alike can be processed the same.</p><p>Since there are also far fewer n-grams than words, this greatly reduces the dimensionality of the representation and enables classification even for short documents. Some approaches, including cld3 (Ooms 2021) and fastText (Joulin, Grave, Bojanowski, Douze, J&#233;gou and Mikolov 2016), further process the text by mapping n-grams -which are sparse features -into embeddings to further reduce the dimensionality of the features. Others, like franc (Csardi, Wormer, Ceglowski, Rideout and Johnson 2021), instead represent each document as a sparse vector of n-gram frequencies, which is 16 Before doing so, an algorithm may first check to ensure that characters in the text do not belong to one of the few languages with a unique alphabet, but this step only identifies a tiny fraction of the world's languages. analogous to a document-term matrix.</p><p>After preprocessing the data, each algorithm proceeds by applying a pre-trained model to the text. The models were trained on a large corpus of documents written in a variety of languages, but where the language of origin is known. Wikipedia is commonly used since it hosts millions of documents written in dozens of languages. The algorithms we employ use a variety of models for this step. cld2 (Ooms 2020), langdetect (Danilak 2021), and langid (Lui and Baldwin 2012) take a similar, simple approach to classification using a naive Bayes classifier on the n-gram frequencies. Other algorithms (e.g., cld3) employ a neural network with a large number of parameters. As part of this step, each method also limits the number of languages and/or dialects it is trained to detect. The algorithms we use detect between 56 (langdetect) and 206 (franc) languages. We provide the complete list for each model in .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Post-level classification</head><p>We asked each language detection tool to detect one language per input text. This yielded seven language labels for each post. 17 For the vast majority of posts, all seven methods give identical labels, while for others they disagree, mainly when one or more detection method does not cover the relevant language. Disagreements also happen when texts are very short, posts are actually written in two languages, or posts include proper nouns (names, titles, locations, etc.). As we describe in the SI Section C (pp. 6-9), we find that excluding posts with less than 125 characters reduces disagreement between methods significantly. 18 Thus, in the subsequent analyses, we use the 125 characters threshold. 17 Before doing so, we did simple preprocessing of post texts (lower-casing letters and removing URLs, emoji, punctuation, hashtags, usernames, and numbers).</p><p>18 Once we exclude the posts with less than 125 characters, they give the identical language label for 75.7% of the posts, and at least four methods give the same label for In three cases, we merged languages to further improve performance. First, we treat Indonesian and Malay as a single language (Indonesian-Malay). This is because the language detection methods provided inconsistent labeling for posts by Indonesian and Malaysian parties. Since the two languages are mutually intelligible <ref type="bibr">(Adelaar and Himmelmann 2004;</ref><ref type="bibr">Wichmann 2020</ref>), and we can safely assume that Indonesian (Malaysian) parties do not use Malay (Indonesian), collapsing them as a single language gives more reliable classification results. As we detail in the SI Section D (pp. 10-11), human coding of Indonesian and Malaysian parties' posts validates this decision. Second, we found that the method had difficulty distinguishing Central Kurdish and Persian in several Iraqi parties. Since human coding showed that these posts were actually Central Kurdish and we have no data from Iran itself, we simply collapse these languages into Central Kurdish-Persian. 19   A remaining concern are the posts from Bosnia and Herzegovina, Croatia, Montenegro, and Serbia, where the primary languages are Bosnian, Croatian, Montenegrin, and Serbian (if you view them as separate) or Serbo-Croatian (if you view them all as one language). These "politically divorced" <ref type="bibr">(Laitin 2000)</ref> languages are particularly difficult cases since they are mutually intelligible and share a common grammar and spelling, making them difficult to distinguish from texts alone. This means that we observed high rates of post-level disagreements between methods. 20 In our main results, we choose to combined them into one Serbo-Croatian because, "[w]hat is clear to everyone <ref type="bibr">[...]</ref> is that all of these languages share a common core, a fact which enables 98.5% of the posts. More than 60% of the posts have more than 125 characters.</p><p>19 Additional details of the human coding are provided in the SI Section C (pp. 6-9).</p><p>20 Indeed, if we treat these languages as separate ones, many parties in the region are labeled as multilingual, even in cases where this makes little sense. For instance, nearly half of posts in Croatia were coded as Bosnian by many methods even though Bosniaks make up less than 1% of the population.</p><p>all their speakers to communicate freely with one another" <ref type="bibr">(Alexander 2006, p. xvii)</ref>. To the extent that our argument rests on concerns about creating a common public sphere, this means that while language choice might serve as an important identity signal to voters, it does not impede actual communication and dialogue per se in this setting.</p><p>However, in the SI Section E (pp. 12-14), we also report results where, first, they are treated as separate, and, second, where these cases are removed entirely. All results are essentially unchanged.</p><p>To understand what languages appear in our party-level corpus, we explore the relationship between the proportion of first-language (L1) users and that of parties' Facebook posts written in the corresponding languages. The former is based on Ethnologue <ref type="bibr">(Eberhard, Simons and Fennig 2022)</ref>, 21 and we focus on the three most spoken languages in each country that are used by more than 5% of the population. We also exclude languages only two or fewer detection methods can detect. To measure the proportion of each language in party communication, we take the average proportion of the posts written in that language across detection methods (if they include the language). 22</p><p>Note that using the census-based proportion of L1 users as a metric of a language community size raises several conceptual issues. To begin, first language maps strongly onto ethnicity and may not fully capture the actual communicative practice of the country, such as the presence of a lingua franca or the possibility that people with different first languages can communicate with each other using a shared secondary language <ref type="bibr">(Laitin 2000;</ref><ref type="bibr">Liu and Pizzi 2018)</ref>. For some of the language groups in our sample (e.g., Guarani in Paraguay and Zulu in South Africa), the number of L1 users is smaller than the total number of speakers who can communicate using another language (e.g., 21 See <ref type="url">https://www.ethnologue.com/</ref>.</p><p>22 For example, only five detection methods can detect Armenian. Hence, the estimated proportion of Armenian posts among Armenian parties is computed on these five methods. more people overall speak Spanish than Guarani in Paraguay, even though Guarani has the largest community as measured by L1). In addition, it is not always clear how people interpret language questions in a census. Some scholars even suggest that choosing a certain native language in the census captures an expression of people's loyalty to an ethnic group and broader political preferences, rather than their commutative competence <ref type="bibr">(Arel 2002;</ref><ref type="bibr">Kulyk 2018)</ref>. Consequently, the number of L1 users may reflect not just how many people use the language but also how many people strongly identify themselves with that language and care about it <ref type="bibr">(Kulyk 2011)</ref>.</p><p>With these limitations in mind, Figure <ref type="figure">1</ref> summarizes our exploratory findings. The horizontal axis shows the proportion of L1 users of a given language-country, and the vertical axis is the estimated proportion of parties' Facebook posts written in the same language aggregated at the level of the country. Panels are separated by the rank of language groups based on their population shares. The dashed lines indicate 45-degree lines. Language groups located above (below) these lines mean that their languages are overrepresented (underrepresented) in party communication relative to their population shares. The solid lines show fitted linear models based on regressing the two axes.</p><p>[Figure <ref type="figure">1</ref> about here]</p><p>In general, we find a strong positive association between the population share of the language and the extent to which it appears in party communication, as indicated by positive fitted lines. This gives first-step validation to our post-level classification because parties should be more likely to use a language that is more common in the population.</p><p>But more interestingly, these estimated linear lines nearly overlap the 45-degree lines. This means that there is a clear correspondence between the proportion of the language spoken in the country and that of the posts written in the same language. Hence, party communication on social media in the aggregate seems to give proportional attention to different language groups. Additionally, several of the language-country pairs that most strongly diverge from the 45-degree lines validate our post-level classification because they all represent cases in which the proportion of L1 users is arguably a poor proxy for the size of the actual language community. This includes Zulu in South Africa in Panel (a), Indonesian and Javanese in Indonesia in Panels (a) and (b), Spanish in Paraguay in Panel (b), and Russian in Kyrgyzstan in Panel (c). All of these are cases where one language (English, Indonesian, Spanish, and Russian) serves as a lingua franca for political communication even where it is not the largest L1 language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Party-level classification</head><p>Using the post-level languages labels, the final step is to construct a party-level measure of multilingualism. The difficulty here is distinguishing between cases where parties truly include texts in multiple languages and cases where we are simply mis-measuring language use at the post level. This is particularly important for languages not included in some or all of the language classifiers.</p><p>To mitigate these concerns, we do not use the specific language the algorithms identify. Instead, we focus on a dichotomous measure of whether the party is primarily multilingual or monolingual. The advantage of this approach is that we do not need to identify the "correct" language for each post or the overall proportion of posts associated with each language. Our findings above suggest that existing language classifiers are simply too inaccurate to provide reliable estimates for these quantities at scale. By instead focusing on multilingualism, we only require that each method consistently assigns the same label to any one language (and different labels to different languages) rather than the correct label. For example, langdetect and tika do not include Armenian and give an Estonian label to posts by Armenian parties. However, since this "wrong" labeling is internally consistent, it does not impact the final multilingual classification of Armenian parties.</p><p>Specifically, we classify parties as monolingual or multilingual using the following steps. First, for each detection method, we labeled the party as primarily multilingual if the proportion of the most detected language is below 90% (meaning more than 10% of posts are not from the dominant language). Second, we created a party &#215; detection method matrix ! !&#215;# , where Ypj = 1 if party p is multilingual according to detection method j, and Ypj = 0 otherwise. Third, we fit a latent class model <ref type="bibr">(Linzer and Lewis 2011)</ref> to synthesize these judgements. 23   In addition, we hired human coders to label the language of 300 randomly selected posts for 44 difficult cases where either our labels showed high levels of disagreement between methods or where we were not detecting languages widely spoken in a country (suggesting we may be missing a language). 24 In only five cases, the human coders significantly disagreed with the latent class labels. Four were parties in Timor-Leste, which resulted from poor handling of Tetum. The fifth was an Iraqi party where the methods struggled with Central Kurdish. In these five cases, we used the human coder judgements instead.</p><p>In the end, we classify 101 parties (12%) as primarily multilingual. <ref type="foot">25</ref> We find multilingual parties in 28 countries. Of these, 25 countries have both monolingual and multilingual parties, while in Lesotho, Kyrgyzstan, and Mauritius, all parties practice multilingualism. In the remaining 59 countries, parties are all classified as primarily monolingual. These cases include linguistically homogeneous countries and those where national politics operates in a single language despite the presence of various language communities. In most of these 59 countries, all parties use the same language. However, in Kosovo, Romania, and Ukraine, monolingual parties use different languages to appeal 23 Additional details of these steps are provided in the SI Section F (pp. 15-17).</p><p>24 Specifically, we examined eight parties in Botswana, Kyrgyzstan, Luxembourg, Nepal, New Zealand, North Macedonia, and Romania. We also re-examined all parties in Indonesia, Iraq, Lesotho, Malaysia, and Timor-Leste. Additional details for this exercise are provided in SI Sections C-F (pp. 6-17).</p><p>to distinct language groups. Put differently, these countries represent cases in which the party systems show a complete split along linguistic lines.</p><p>To validate our measure, we follow <ref type="bibr">Adcock and Collier (2001)</ref> and inspect convergent and divergent cases by analyzing a relationship between linguistic diversity and the proportion of primarily multilingual parties by country. The measure of linguistic diversity is based on ethno-linguistic fractionalization (ELF) by <ref type="bibr">Desmet et al. (2012)</ref> because it covers the most countries, with Kosovo as the only missing case. They provide 15 different fractionalization indices at different levels of group aggregation based on language trees from Ethnologue. We use their measure at the second lowest level of aggregation, ELF (14), as it is most highly correlated with other widely used measures of ELF <ref type="bibr">(Alesina, Devleeschauwer, Easterly, Kurlat and Wacziarg 2003;</ref><ref type="bibr">Desmet, Weber and Ortu&#241;o-Ort&#237;n 2009)</ref>.</p><p>Since the measure of ELF relies on census data, the conceptual concerns about L1 users that we discussed above also apply to this measure. That is, it may overestimate linguistic diversity in countries with a lingua franca and capture not so much people's actual communicative practices as the intensity of their identification with certain ethnic/language groups <ref type="bibr">(Kulyk 2011;</ref><ref type="bibr">Laitin 2000)</ref>. <ref type="foot">26</ref> Nevertheless, we think that this is a reasonable statistic in the current context because more groups with different languages mean that there are more languages in the country.</p><p>Figure <ref type="figure">2</ref> shows the relationship between linguistic diversity and the proportion of multilingual parties. The solid linear line indicates that there is a positive association between the two. This makes sense because seat-maximizing parties should have greater incentives to use multilingual appeals as linguistic diversity increases. Moreover, we find no multilingual party in countries like Indonesia, Namibia, and Senegal, where, despite high scores on linguistic diversity, a single language is used as the medium of government. This provides additional confidence to our classification results.</p><p>[Figure <ref type="figure">2</ref> about here]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Analyses</head><p>Hypotheses 1 and 2 suggest that institutional and ideological constraints moderate the relationship between linguistic diversity and parties' adoption of multilingualism. The sources and operationalization of the key variables are as follows.</p><p>We use two country-level variables: linguistic diversity and electoral systems. As we explained above, the former is based on <ref type="bibr">Desmet et al. (2012)</ref>. For electoral systems, we rely on V-Dem <ref type="bibr">(Coppedge et al. 2021</ref>) and assign the value of 1 for countries using majoritarian systems and 0 otherwise. During the period under study, Mongolia switched from a non-majoritarian system to a majoritarian one. Since we do not analyze the temporal dynamics of parties' multilingualism in this study, in the case of Mongolia, we use the modal system between 2016 and 2022. Thirteen countries are coded as majoritarian systems.</p><p>To capture parties' ideological orientations, we rely on the Global Party Survey (GPS; Norris 2019). The advantage of the GPS is that it provides two separate measures for parties' social/cultural and economic ideologies. 27 We expect that social/cultural left ideology, rather than economic left ideology, drives parties' multilingualism. For both measures, 0 indicates that parties are extreme left, whereas 10 means that they are extreme right. The correlation between the two variables is only moderate (r = 0.46). 28   We fit a linear probability model with linguistic diversity, a measure for our 27 The GPS does not include Honduras, Kosovo, Senegal, and Sri Lanka. Also, it does not cover small or relatively new parties.</p><p>28 Descriptive statistics are in <ref type="bibr">SI Table H.1 (p. 21)</ref>.</p><p>hypothesized moderator, and the interaction of the two. 29 Our unit of analysis is the party, and the outcome is a dummy indicator of primarily multilingual parties. We also weight observations by the logged number of posts. In Models 1-3 of Table <ref type="table">2</ref>, we consider each moderating variable separately, which has the advantage of maximizing the number of parties we can include to test each hypothesis. In Models 4 and 5, however, we combine them into a single model specification and ensure that our results remain the same.</p><p>[Table <ref type="table">2</ref> about here]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>H1: Electoral system</head><p>We begin by analyzing the moderating roles of electoral systems. Panel (a) of Figure <ref type="figure">3</ref> summarizes the marginal effect of linguistic diversity on parties' adoption of multilingualism under different electoral systems, based on Model 1 of Table <ref type="table">2</ref>. Vertical bars indicating 95% confidence intervals. We find that under proportional systems, the coefficient on linguistic diversity is 0.42 with a 95% confidence interval of <ref type="bibr">[0.18, 0.65]</ref>. By contrast, under majoritarian systems, the coefficient is 0.76 with a 95% confidence interval of [0.56, 0.95].</p><p>[Figure <ref type="figure">3</ref> about here]</p><p>The difference between the two estimates (the interaction term ELF &#215; majoritarian system) is 0.34 and statistically significant with a 95% confidence interval of [0.04, 0.65].</p><p>Further, this difference is substantively meaningful as switching from non-majoritarian to majoritarian systems can increase the marginal effect of linguistic diversity by 81%. 29 We do not use logistic regression because there is virtually no multilingual party when linguistic diversity is low (especially under majoritarian systems), meaning that our data has a problem of quasi-complete separation. Using logistic regression would lead to convergence failures. Therefore, our results are generally consistent with the notion that the relationship between linguistic diversity and parties' adoption of multilingualism is more pronounced under majoritarian rules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>H2: Party ideology</head><p>Next, we analyze the moderating roles of party ideology in Models 2 and 3 of Table <ref type="table">2</ref>. 30   We find that the interaction term ELF &#215; social ideology is negative and statistically different from 0 with a 95% confidence interval of <ref type="bibr">[-0.14, -0.01]</ref>. By contrast, the interaction terms ELF &#215; economic ideology does not reach the conventional levels of statistical significance with a 95% confidence interval of <ref type="bibr">[-0.10, 0.02]</ref>. These results suggest that parties' left-right ideology in the social dimension can condition their multilingualism, but their ideology in the economic dimension may not.</p><p>Panel (b) of Figure <ref type="figure">3</ref> shows the marginal effect of linguistic diversity on parties' multilingualism conditional on social/cultural ideology, based on Model 2 of Table <ref type="table">2</ref>. It shows that socially/culturally left-leaning parties are more likely to translate linguistic diversity into multilingualism than their right-leaning counterparts. Indeed, for extreme right parties, the marginal effect of linguistic diversity becomes statistically insignificant.</p><p>These patterns are consistent with the argument that socially leftist parties become more willing to accommodate different language groups as linguistic diversity increases, while socially right-leaning parties are unresponsive to changes in linguistic diversity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Extension: candidates' language use</head><p>In this section, we analyze candidate-level language practices by exploring whether parties' multilingualism is reflected in the language use of candidates they nominate. 30 We do not find that the moderating roles of ideology deviate from linearity <ref type="bibr">(Hainmueller, Mummolo and Xu 2019)</ref>.</p><p>varies by country, 32 and our analysis is restricted to those who used public Facebook pages.</p><p>Using CrowdTangle API, we downloaded these candidates' posts within the 60-day window of the elections. We focus only on candidates running for the parties included in the party-level data. We also exclude candidates with fewer than 10 posts. This leaves 4,188 candidates for 111 parties. The total number of posts we analyze exceeds 37,000.</p><p>We relied on the same procedures as the ones for the party-level classification to measure candidates' language practices. The result is a dummy variable that equals one for candidates engaged in multilingual communication. We also created a dummy indicator of whether the most used language by the candidate is a minority language of the country. As SI Table <ref type="table">I</ref>.1 (p. 22) summarizes, 417 candidates (10.0%) are primarily multilingual, and 433 candidates (10.3%) mainly communicate in a language other than the most dominant one in their countries.</p><p>The unit of analysis is candidate i in party p in country c. The outcomes are whether the candidate is primarily multilingual and whether the candidate mainly uses a minority language. The key predictor is a dummy indicator of whether the candidate's party is primarily multilingual. We control for parties' social ideology as well as countries' electoral systems and linguistic diversity. The model specification is based on a multilevel linear probability model with nested random effects by party and country. We weight observations by the logged number of posts.</p><p>Table <ref type="table">3</ref> summarizes the results. In Models 1 and 2, the outcome is a dummy indicator of whether candidates are primarily multilingual. The estimates of primarily multilingual party are positive and statistically discernible from 0 (p &lt; 0.01), regardless 32 While more than 75% of the candidates in Canada set up public Facebook pages, this number goes down to around 20% in Israel, the Netherlands, and Serbia. Candidates under candidate-centric electoral systems are more likely to use a public Facebook page than those under party-centric systems. of whether we control for the key party-and country-level covariates. This means that the candidates of primarily multilingual parties are more likely to embrace multilingualism than those of monolingual parties.</p><p>[Table <ref type="table">3 about here]</ref> Next, Models 3 and 4 of Table <ref type="table">3</ref> analyze whether candidates mainly communicate in a minority language. The estimates of primarily multilingual party are positive and statistically discernible from 0, although only at the 0.10 level in Model 4. This means that party-level multilingualism is positively correlated not only with candidate-level multilingualism but also with how much candidates are willing to use a non-majority language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>This study analyzes parties' and candidates' Facebook posts to understand their day-today language practices. More specifically, by applying computational tools of language detection, we generate the first classification of primarily monolingual and multilingual parties in 87 democracies. The resulting dataset provides the most comprehensive descriptive account of which parties embrace multilingualism to date.</p><p>Substantively, this study offers two novel insights into the linguistic aspect of party competition. First, we show that the interplay between countries' linguistic composition and parties' electoral incentives is related to their adoption of multilingual appeals. Specifically, majoritarian systems tend to encourage parties to cut across language lines more than non-majoritarian systems when linguistic diversity is high. Further, socially/culturally left-leaning parties are more likely to adopt multilingualism than right-leaning parties as linguistic diversity increases. These findings deepen our understanding of the strategic behavior of seat-maximizing parties. Second, extending our analysis to the language use of candidates in a dozen of multilingual democracies, we show that candidates nominated by multilingual parties tend to adopt similar multilingual communication strategies to those of their parties.</p><p>The implications of these findings are important. They indicate that the presence of multilingual parties matters not just because they can communicate with different language groups. Rather, their presence has far-reaching consequences on how language spaces are structured by individual candidates during elections. Because they can effectively dampen, rather than reinforce, language-based divisions, multilingual parties and candidates may transform representation from being group-based to being interestbased. This could in theory decrease the level of political conflict and instability.</p><p>The dataset we provide opens up various avenues for future research. First, this study is by its nature descriptive, providing a cross-sectional snapshot of party and candidate behavior. The correlations we report are consistent with our theory, but future work may seek to provide a stronger ground for establishing causal claims. For instance, as we accumulate more data over time from social media, scholars might investigate changing behavior in response to electoral reforms.</p><p>Second, it is critical to assess overtime changes in parties' multilingualism. It is especially interesting to examine how election timing influences parties' decisions to use multilingual appeals, as some parties may try to accommodate minority voters only when elections get closer. Third, it is important to ask what additional factors determine parties' adoption of multilingualism. Although we explore two key determinants, there is still room for further theorization and empirical evaluation. We also envision examining parties' decisions to use different languages at the level of the post, which requires analyzing the post content. Finally, we should examine how multilingual parties influence voter behavior. By combining our dataset and survey data, we can analyze how multilingual parties shape voters' political attitudes, such as institutional trust and perceptions of parties. All these questions will eventually help us understand the overall impact of multilingual parties on the quality of democratic deliberation.</p><p>As a final note, we caution downstream researchers against blindly using the language labels of the posts or parties, rather than the dichotomous multilingualism labels, in their applications. As we discussed above, existing language detection tools do not always give the "correct" language labels to the texts from some countries. What we propose in this study is an approach to classify parties into monolingual and multilingual ones that is plausibly resilient to this problem. If researchers want to use a specific language label as an explanatory or outcome variable, it is important to perform additional steps to validate detected languages or pick the right computational tool of language detection that is suited for a specific case. We highlight cases where this issue may be specifically important in the SI Section F (pp. 15-17).   Note: The figure shows the relationship between the proportion of first-language users and the proportion of party Facebook posts written in the corresponding language. Dashed lines are 45-degree lines, and solid lines are estimated on a bivariate linear regression. Shaded areas indicate 95% confidence intervals. The figure only reports languages detected in three or more detection methods. Note: The figure shows the relationship between linguistic diversity and the proportion of multilingual parties by country. The solid line is estimated on a bivariate linear regression. Shaded area indicates a 95% confidence interval. Note: Panel (a) summarizes the marginal effect of linguistic diversity on parties' multilingualism conditional on electoral systems based on Model 1, Table 2. Panel (b) shows the marginal effect of linguistic diversity on parties' multilingualism conditional on social ideology based on Model 2, Table 2. Vertical bars and shaded areas indicate 95% confidence intervals.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>For studies on politicians' language use, seeCrisp, Demirkaya, Schwindt-Bayer and   </p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1"><p>Media is equally important to shape public spheres<ref type="bibr">(Caluwaerts and Reuchamps 2014)</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_2"><p>We use the average Polity scores<ref type="bibr">(Marshall and Gurr 2020</ref>) between 2016 and 2018   </p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="25" xml:id="foot_3"><p>See.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="26" xml:id="foot_4"><p><ref type="bibr">Laitin (2000)</ref> also notes that Ethnologue's language trees are not equally sensitive to dialectical differences across regions.</p></note>
		</body>
		</text>
</TEI>
