Analyzing ideology and polarization is of critical importance in advancing our grasp of modern politics. Recent research has made great strides towards understanding the ideological bias (i.e., stance) of news media along the left-right spectrum. In this work, we instead take a novel and more nuanced approach for the study of ideology based on its left or right positions on the issue being discussed. Aligned with the theoretical accounts in political science, we treat ideology as a multi-dimensional construct, and introduce the first diachronic dataset of news articles whose ideological positions are annotated by trained political scientists and linguists at the paragraph level. We showcase that, by controlling for the author{'}s stance, our method allows for the quantitative and temporal measurement and analysis of polarization as a multidimensional ideological distance. We further present baseline models for ideology prediction, outlining a challenging task distinct from stance detection. 
                        more » 
                        « less   
                    
                            
                            POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection
                        
                    
    
            Ideology is at the core of political science research. Yet, there still does not exist general-purpose tools to characterize and predict ideology across different genres of text. To this end, we study Pretrained Language Models using novel ideology-driven pretraining objectives that rely on the comparison of articles on the same story written by media of different ideologies. We further collect a large-scale dataset, consisting of more than 3.6M political news articles, for pretraining. Our model POLITICS outperforms strong baselines and the previous state-of-the-art models on ideology prediction and stance detection tasks. Further analyses show that POLITICS is especially good at understanding long or formally written texts, and is also robust in few-shot learning scenarios. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2127747
- PAR ID:
- 10354124
- Date Published:
- Journal Name:
- Findings of the Association for Computational Linguistics: NAACL 2022
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of mainstream media in US and social media posts from Reddit and Twitter. We conduct in-depth analyses on news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%.more » « less
- 
            Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time.more » « less
- 
            Multiple recent efforts have used large-scale data and computational models to automatically detect misinformation in online news articles. Given the potential impact of misinformation on democracy, many of these efforts have also used the political ideology of these articles to better model misinformation and study political bias in such algorithms. However, almost all such efforts have used source level labels for credibility and political alignment, thereby assigning the same credibility and political alignment label to all articles from the same source (e.g., the New York Times or Breitbart). Here, we report on the impact of journalistic best practices to label individual news articles for their credibility and political alignment. We found that while source level labels are decent proxies for political alignment labeling, they are very poor proxies-almost the same as flipping a coin-for credibility ratings. Next, we study the implications of such source level labeling on downstream processes such as the development of automated misinformation detection algorithms and political fairness audits therein. We find that the automated misinformation detection and fairness algorithms can be suitably revised to support their intended goals but might require different assumptions and methods than those which are appropriate using source level labeling. The results suggest caution in generalizing recent results on misinformation detection and political bias therein. On a positive note, this work shares a new dataset of journalistic quality individually labeled articles and an approach for misinformation detection and fairness audits.more » « less
- 
            Abstract One dimension of the emerging politics of connected and automated vehicles (CAVs) is the development of public concerns over their societal implications and associated policy issues. This study uses original survey data from the United States to contribute to the anticipation of future policy and political issues for CAVs. Several studies have surveyed the public regarding CAVs; however, there are few studies that highlight the multidimensional public concerns that CAVs will most likely bring. The study breaks down the concept of “public” by showing that the demographic variables of gender, age, race, ethnicity, income, location (rural, suburban, urban), and political ideology (conservative, moderate, liberal) are significantly associated with three of the most salient public concerns to date (safety, privacy, and data security). Furthermore, the effects of demographic variables also vary across the type of policy issue. For example, women tend to be more concerned about safety than their male counterparts, and Hispanics (Latinx) tend to be more concerned about privacy than non-Hispanics. The research shows how the social scientific analysis of the “politics” of CAVs will require attention to the variegated connections between different types of public concern and different demographic variables.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    