AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of second opinions may affect decision-makers' behavior and performance in AI-assisted decision-making. We find that if both the AI model's decision recommendation and a second opinion are always presented together, decision-makers reduce their over-reliance on AI while increase their under-reliance on AI, regardless whether the second opinion is generated by a peer or another AI model. However, if decision-makers have the control to decide when to solicit a peer's second opinion, we find that their active solicitations of second opinions have the potential to mitigate over-reliance on AI without inducing increased under-reliance in some cases. We conclude by discussing the implications of our findings for promoting effective human-AI collaborations in decision-making. 
                        more » 
                        « less   
                    This content will become publicly available on June 23, 2026
                            
                            Can Domain Experts Rely on AI Appropriately? A Case Study on AI-Assisted Prostate Cancer MRI Diagnosis
                        
                    
    
            Despite the growing interest in human-AI decision making, experimental studies with domain experts remain rare, largely due to the complexity of working with domain experts and the challenges in setting up realistic experiments. In this work, we conduct an in-depth collaboration with radiologists in prostate cancer diagnosis based on MRI images. Building on existing tools for teaching prostate cancer diagnosis, we develop an interface and conduct two experiments to study how AI assistance and performance feedback shape the decision making of domain experts. In Study 1, clinicians were asked to provide an initial diagnosis (human), then view the AI's prediction, and subsequently finalize their decision (human-AI team). In Study 2 (after a memory wash-out period), the same participants first received aggregated performance statistics from Study 1, specifically their own performance, the AI's performance, and their human-AI team performance, and then directly viewed the AI's prediction before making their diagnosis (i.e., no independent initial diagnosis). These two workflows represent realistic ways that clinical AI tools might be used in practice, where the second study simulates a scenario where doctors can adjust their reliance and trust on AI based on prior performance feedback. Our findings show that, while human-AI teams consistently outperform humans alone, they still underperform the AI due to under-reliance, similar to prior studies with crowdworkers. Providing clinicians with performance feedback did not significantly improve the performance of human-AI teams, although showing AI decisions in advance nudges people to follow AI more. Meanwhile, we observe that the ensemble of human-AI teams can outperform AI alone, suggesting promising directions for human-AI collaboration. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10614889
- Publisher / Repository:
- ACM
- Date Published:
- Format(s):
- Medium: X
- Location:
- Athens, Greece
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Mueller, Florian Floyd; Kyburz, Penny; Williamson, Julie R; Sas, Corina; Wilson, Max L; Dugas, Phoebe Toups; Shklovski, Irina (Ed.)Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use cases? This paper presents a first hand account of ideating AI concepts to improve critical care medicine. As a team of data scientists, clinicians, and HCI researchers, we conducted a series of design workshops to explore more effective approaches to AI concept ideation and problem formulation. We detail our process, the challenges we encountered, and practices and artifacts that proved effective. We discuss the research implications for improved collaboration and stakeholder engagement, and discuss the role HCI might play in reducing the high failure rate experienced in AI innovation.more » « less
- 
            Proper calibration of human reliance on AI is fundamental to achieving complementary performance in AI-assisted human decision-making. Most previous works focused on assessing user reliance, and more broadly trust, retrospectively, through user perceptions and task-based measures. In this work, we explore the relationship between eye gaze and reliance under varying task difficulties and AI performance levels in a spatial reasoning task. Our results show a strong positive correlation between percent gaze duration on the AI suggestion and user AI task agreement, as well as user perceived reliance. Moreover, user agency is preserved particularly when the task is easy and when AI performance is low or inconsistent. Our results also reveal nuanced differences between reliance and trust. We discuss the potential of using eye gaze to gauge human reliance on AI in real-time, enabling adaptive AI assistance for optimal human-AI team performance.more » « less
- 
            Artificial intelligence (AI) has the potential to improve human decision-making by providing decision recommendations and problem-relevant information to assist human decision-makers. However, the full realization of the potential of human–AI collaboration continues to face several challenges. First, the conditions that support complementarity (i.e., situations in which the performance of a human with AI assistance exceeds the performance of an unassisted human or the AI in isolation) must be understood. This task requires humans to be able to recognize situations in which the AI should be leveraged and to develop new AI systems that can learn to complement the human decision-maker. Second, human mental models of the AI, which contain both expectations of the AI and reliance strategies, must be accurately assessed. Third, the effects of different design choices for human-AI interaction must be understood, including both the timing of AI assistance and the amount of model information that should be presented to the human decision-maker to avoid cognitive overload and ineffective reliance strategies. In response to each of these three challenges, we present an interdisciplinary perspective based on recent empirical and theoretical findings and discuss new research directions.more » « less
- 
            With the rapid development of decision aids that are driven by AI models, the practice of AI-assisted decision making has become increasingly prevalent. To improve the human-AI team performance in decision making, earlier studies mostly focus on enhancing humans' capability in better utilizing a given AI-driven decision aid. In this paper, we tackle this challenge through a complementary approach—we aim to train behavior-aware AI by adjusting the AI model underlying the decision aid to account for humans' behavior in adopting AI advice. In particular, as humans are observed to accept AI advice more when their confidence in their own judgement is low, we propose to train AI models with a human-confidence-based instance weighting strategy, instead of solving the standard empirical risk minimization problem. Under an assumed, threshold-based model characterizing when humans will adopt the AI advice, we first derive the optimal instance weighting strategy for training AI models. We then validate the efficacy and robustness of our proposed method in improving the human-AI joint decision making performance through systematic experimentation on synthetic datasets. Finally, via randomized experiments with real human subjects along with their actual behavior in adopting the AI advice, we demonstrate that our method can significantly improve the decision making performance of the human-AI team in practice.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
