NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Perceptions of Linguistic Uncertainty by Language Models and Humans

Belem, Catarina G; Kelly, Markelle; Steyvers, Mark; Singh, Sameer; Smyth, Padhraic (November 2024, ACL Anthology)

Full Text Available
Learning with AI Assistance: A Path to Better Task Performance or Dependence?

https://doi.org/10.1145/3643562.3672610

Karny, Sheer; Mayer, Lukas William; Ayoub, Jackie; Song, Miao; Su, Haotian; Tian, Danyang; Moradi-Pari, Ehsan; Steyvers, Mark (June 2024, ACM)

With the proliferation of AI, there is a growing concern regarding individuals becoming overly reliant on AI, leading to a decrease in intrinsic skills and autonomy. Assistive AI frameworks, on the other hand, also have the potential to improve human learning and performance by providing personalized learning experiences and real-time feedback. To study these opposing viewpoints on the consequences of AI assistance, we conducted a behavioral experiment using a dynamic decision-making game to assess how AI assistance impacts user performance, skill transfer, and cognitive engagement in task execution. Participants were assigned to one of four conditions that featured AI assistance at different time-points during the task. Our results suggest that AI assistance can improve immediate task performance without inducing human skill degradation or carryover effects in human learning. This observation has important implications for AI assistive frameworks as it suggests that there are classes of tasks in which assistance can be provided without risking the autonomy of the user. We discuss the possible reasons for this set of effects and explore their implications for future research directives.
more » « less
Full Text Available
Differentiating mental models of self and others: A hierarchical framework for knowledge assessment.

https://doi.org/10.1037/rev0000443

Kumar, Aakriti; Smyth, Padhraic; Steyvers, Mark (November 2023, Psychological Review)

Full Text Available
Perceptions of Linguistic Uncertainty by Language Models and Humans

https://doi.org/10.18653/v1/2024.emnlp-main.483

Belém, Catarina G; Kelly, Markelle; Steyvers, Mark; Singh, Sameer; Smyth, Padhraic (January 2024, Association for Computational Linguistics)

*Uncertainty expressions* such as ‘probably’ or ‘highly unlikely’ are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of language models in the same context. In this paper, we investigate how language models map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether language models can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model’s own certainty about that statement. We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI and AI-AI communication.
more » « less
Full Text Available
Capturing Humans’ Mental Models of AI: An Item Response Theory Approach

https://doi.org/10.1145/3593013.3594111

Kelly, Markelle; Kumar, Aakriti; Smyth, Padhraic; Steyvers, Mark (June 2023, FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency)

Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a question-answering setting, repeatedly assessing their teammate’s performance. Using this experimental data, we demonstrate the use of our framework for testing research questions about people’s perceptions of both AI agents and other people. We contrast mental models of AI teammates with those of human teammates as we characterize the dimensionality of these mental models, their development over time, and the influence of the participants’ own self-perception. Our results indicate that people expect AI agents’ performance to be significantly better on average than the performance of other humans, with less variation across different types of problems. We conclude with a discussion of the implications of these findings for human-AI interaction.
more » « less
Full Text Available
An Expert Guide to Planning Experimental Tasks For Evidence-Accumulation Modeling

https://doi.org/10.1177/25152459251336127

Boag, Russell J; Innes, Reilly J; Stevenson, Niek; Bahg, Giwon; Busemeyer, Jerome R; Cox, Gregory E; Donkin, Chris; Frank, Michael J; Hawkins, Guy E; Heathcote, Andrew; et al (April 2025, Advances in Methods and Practices in Psychological Science)

Evidence-accumulation models (EAMs) are powerful tools for making sense of human and animal decision-making behavior. EAMs have generated significant theoretical advances in psychology, behavioral economics, and cognitive neuroscience and are increasingly used as a measurement tool in clinical research and other applied settings. Obtaining valid and reliable inferences from EAMs depends on knowing how to establish a close match between model assumptions and features of the task/data to which the model is applied. However, this knowledge is rarely articulated in the EAM literature, leaving beginners to rely on the private advice of mentors and colleagues and inefficient trial-and-error learning. In this article, we provide practical guidance for designing tasks appropriate for EAMs, relating experimental manipulations to EAM parameters, planning appropriate sample sizes, and preparing data and conducting an EAM analysis. Our advice is based on prior methodological studies and the our substantial collective experience with EAMs. By encouraging good task-design practices and warning of potential pitfalls, we hope to improve the quality and trustworthiness of future EAM research and applications.
more » « less
Free, publicly-accessible full text available April 1, 2026
Three Challenges for AI-Assisted Decision-Making

https://doi.org/10.1177/17456916231181102

Steyvers, Mark; Kumar, Aakriti (January 2023, Perspectives on Psychological Science)

Artificial intelligence (AI) has the potential to improve human decision-making by providing decision recommendations and problem-relevant information to assist human decision-makers. However, the full realization of the potential of human–AI collaboration continues to face several challenges. First, the conditions that support complementarity (i.e., situations in which the performance of a human with AI assistance exceeds the performance of an unassisted human or the AI in isolation) must be understood. This task requires humans to be able to recognize situations in which the AI should be leveraged and to develop new AI systems that can learn to complement the human decision-maker. Second, human mental models of the AI, which contain both expectations of the AI and reliance strategies, must be accurately assessed. Third, the effects of different design choices for human-AI interaction must be understood, including both the timing of AI assistance and the amount of model information that should be presented to the human decision-maker to avoid cognitive overload and ineffective reliance strategies. In response to each of these three challenges, we present an interdisciplinary perspective based on recent empirical and theoretical findings and discuss new research directions.
more » « less
Full Text Available
AI-Assisted Decision-making: a Cognitive Modeling Approach to Infer Latent Reliance Strategies

https://doi.org/10.1007/s42113-022-00157-y

Tejeda, Heliodoro; Kumar, Aakriti; Smyth, Padhraic; Steyvers, Mark (October 2022, Computational Brain & Behavior)

Abstract AI assistance is readily available to humans in a variety of decision-making applications. In order to fully understand the efficacy of such joint decision-making, it is important to first understand the human’s reliance on AI. However, there is a disconnect between how joint decision-making is studied and how it is practiced in the real world. More often than not, researchers ask humans to provide independent decisions before they are shown AI assistance. This is done to make explicit the influence of AI assistance on the human’s decision. We develop a cognitive model that allows us to infer thelatentreliance strategy of humans on AI assistance without asking the human to make an independent decision. We validate the model’s predictions through two behavioral experiments. The first experiment follows aconcurrentparadigm where humans are shown AI assistance alongside the decision problem. The second experiment follows asequentialparadigm where humans provide an independent judgment on a decision problem before AI assistance is made available. The model’s predicted reliance strategies closely track the strategies employed by humans in the two experimental paradigms. Our model provides a principled way to infer reliance on AI-assistance and may be used to expand the scope of investigation on human-AI collaboration.
more » « less
Bayesian modeling of human–AI complementarity

https://doi.org/10.1073/pnas.2111547119

Steyvers, Mark; Tejeda, Heliodoro; Kerrigan, Gavin; Smyth, Padhraic (March 2022, Proceedings of the National Academy of Sciences)

Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybrid performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms.
more » « less
Full Text Available
Combining human predictions with model probabilities via confusion matrices and calibration

Kerrigan, Gavin; Smyth, Padhraic; Steyvers, Mark (December 2021, Advances in Neural Information Processing Systems (NeurIPS 2021))

An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.
more » « less
Full Text Available

« Prev Next »

Search for: All records