skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 15, 2026

Title: Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
Large Language Models (LLMs) have become increasingly incorporated into everyday life for many internet users, taking on significant roles as advice givers in the domains of medicine, personal relationships, and even legal matters. The importance of these roles raise questions about how and what responses LLMs make in difficult political and moral domains, especially questions about possible biases. To quantify the nature of potential biases in LLMs, various works have applied Moral Foundations Theory (MFT), a framework that categorizes human moral reasoning into five dimensions: Harm, Fairness, Ingroup Loyalty, Authority, and Purity. Previous research has used the MFT to measure differences in human participants along political, national, and cultural lines. While there has been some analysis of the responses of LLM with respect to political stance in role-playing scenarios, no work so far has directly assessed the moral leanings in the LLM responses, nor have they connected LLM outputs with robust human data. In this paper we analyze the distinctions between LLM MFT responses and existing human research directly, investigating whether commonly available LLM responses demonstrate ideological leanings — either through their inherent responses, straightforward representations of political ideologies, or when responding from the perspectives of constructed human personas. We assess whether LLMs inherently generate responses that align more closely with one political ideology over another, and additionally examine how accurately LLMs can represent ideological perspectives through both explicit prompting and demographic-based role-playing. By systematically analyzing LLM behavior across these conditions and experiments, our study provides insight into the extent of political and demographic dependency in AI-generated responses.  more » « less
Award ID(s):
2107505
PAR ID:
10649925
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
AAAI Press
Date Published:
Journal Name:
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
Volume:
8
Issue:
3
ISSN:
3065-8365
Page Range / eLocation ID:
2419 to 2430
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundGenerative artificial intelligence (AI) large‐language models (LLMs) have significant potential as research tools. However, the broader implications of using these tools are still emerging. Few studies have explored using LLMs to generate data for qualitative engineering education research. Purpose/HypothesisWe explore the following questions: (i) What are the affordances and limitations of using LLMs to generate qualitative data in engineering education, and (ii) in what ways might these data reproduce and reinforce dominant cultural narratives in engineering education, including narratives of high stress? Design/MethodsWe analyzed similarities and differences between LLM‐generated conversational data (ChatGPT) and qualitative interviews with engineering faculty and undergraduate engineering students from multiple institutions. We identified patterns, affordances, limitations, and underlying biases in generated data. ResultsLLM‐generated content contained similar responses to interview content. Varying the prompt persona (e.g., demographic information) increased the response variety. When prompted for ways to decrease stress in engineering education, LLM responses more readily described opportunities for structural change, while participants' responses more often described personal changes. LLM data more frequently stereotyped a response than participants did, meaning that LLM responses lacked the nuance and variation that naturally occurs in interviews. ConclusionsLLMs may be a useful tool in brainstorming, for example, during protocol development and refinement. However, the bias present in the data indicates that care must be taken when engaging with LLMs to generate data. Specially trained LLMs that are based only on data from engineering education hold promise for future research. 
    more » « less
  2. Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of mainstream media in US and social media posts from Reddit and Twitter. We conduct in-depth analyses on news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%. 
    more » « less
  3. To support positive, ethical human-robot interactions, robots need to be able to respond to unexpected situations in which societal norms are violated, including rejecting unethical commands. Implementing robust communication for robots is inherently difficult due to the variability of context in real-world settings and the risks of unintended influence during robots’ communication. HRI researchers have begun exploring the potential use of LLMs as a solution for language-based communication, which will require an in-depth understanding and evaluation of LLM applications in different contexts. In this work, we explore how an existing LLM responds to and reasons about a set of norm-violating requests in HRI contexts. We ask human participants to assess the performance of a hypothetical GPT-4-based robot on moral reasoning and explanatory language selection as it compares to human intuitions. Our findings suggest that while GPT-4 performs well at identifying norm violation requests and suggesting non-compliant responses, its flaws in not matching the linguistic preferences and context sensitivity of humans prevent it from being a comprehensive solution for moral communication between humans and robots. Based on our results, we provide a four-point recommendation for the community in incorporating LLMs into HRI systems. 
    more » « less
  4. Effective response to pandemics requires coordinated adoption of mitigation measures, like masking and quarantines, to curb a virus's spread. However, as the COVID-19 pandemic demonstrated, political divisions can hinder consensus on the appropriate response. To better understand these divisions, our study examines a vast collection of COVID-19-related tweets. We focus on five contentious issues: coronavirus origins, lockdowns, masking, education, and vaccines. We describe a weakly supervised method to identify issue-relevant tweets and employ state-of-the-art computational methods to analyze moral language and infer political ideology. We explore how partisanship and moral language shape conversations about these issues. Our findings reveal ideological differences in issue salience and moral language used by different groups. We find that conservatives use more negatively-valenced moral language than liberals and that political elites use moral rhetoric to a greater extent than non-elites across most issues. Examining the evolution and moralization on divisive issues can provide valuable insights into the dynamics of COVID-19 discussions and assist policymakers in better understanding the emergence of ideological divisions. 
    more » « less
  5. Abstract The widespread adoption of conversational LLMs for software development has raised new security concerns regarding the safety of LLM-generated content. Our motivational study outlines ChatGPT’s potential in volunteering context-specific information to the developers, promoting safe coding practices. Motivated by this finding, we conduct a study to evaluate the degree of security awareness exhibited by three prominent LLMs: Claude 3, GPT-4, and Llama 3. We prompt these LLMs with Stack Overflow questions that contain vulnerable code to evaluate whether they merely provide answers to the questions or if they also warn users about the insecure code, thereby demonstrating a degree of security awareness. Further, we assess whether LLM responses provide information about the causes, exploits, and the potential fixes of the vulnerability, to help raise users’ awareness. Our findings show that all three models struggle to accurately detect and warn users about vulnerabilities, achieving a detection rate of only 12.6% to 40% across our datasets. We also observe that the LLMs tend to identify certain types of vulnerabilities related to sensitive information exposure and improper input neutralization much more frequently than other types, such as those involving external control of file names or paths. Furthermore, when LLMs do issue security warnings, they often provide more information on the causes, exploits, and fixes of vulnerabilities compared to Stack Overflow responses. Finally, we provide an in-depth discussion on the implications of our findings, and demonstrated a CLI-based prompting tool that can be used to produce more secure LLM responses. 
    more » « less