skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Benchmarking an AI-Guided Reasoning-Based Operator Support System on the Three Mile Island Accident Scenario
Abstract In the Nuclear Power Plant (NPP) control room, the operators’ performance in emergencies is impacted by the need to monitor many indicators on the control room boards, the limited time to interact with dynamic events, and the incompleteness of the operator’s knowledge. Recent research has been directed toward increasing the level of automation in the NPP system by employing modern AI techniques that support the operator’s decisions. In previous work, the authors have employed a novel AI-guided declarative approach (namely, Answer Set Programming (ASP)) to represent and reason with human qualitative knowledge. This represented knowledge is structured to form a reasoning-based operator support system that assists the operator and compensates for any knowledge incompleteness by performing reasoning to diagnose failures and recommend executing actions in real time. A general ASP code structure has been proposed and tested against simple scenarios, e.g., diagnosis of pump failures that result in loss of flow transients and generating the needed plans for resolving the issue of stuck valves in the secondary loop. In this work, we investigate the potential of the previously proposed ASP structure by applying ASP to a realistic case study of the Three Mile Island, Unit 2 (TMI-2) accident event sequence (in particular, the first 142 minutes). The TMI scenario presents many challenges for a reasoning system, including a large number of variables, the complexity of the scenario, and the misleading readings. The capability of the ASP-based reasoning system is tested for diagnosis and recommending actions throughout the scenario. This paper is the first work to test and demonstrate the capability of an automated reasoning system by applying it to a realistic nuclear accident scenario, such as the TMI-2 accident.  more » « less
Award ID(s):
1914635
PAR ID:
10208816
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 28th Conference on Nuclear Engineering
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The paper describes an ongoing effort in developing a declarative system for supporting operators in the Nuclear Power Plant (NPP) control room. The focus is on two modules: diagnosis and explanation of events that happened in NPPs. We describe an Answer Set Programming (ASP) representation of an NPP, which consists of declarations of state variables, components, their connections, and rules encoding the plant behavior. We then show how the ASP program can be used to explain the series of events that occurred in the Three Mile Island, Unit 2 (TMI-2) NPP accident, the most severe accident in the USA nuclear power plant operating history. We also describe an explanation module aimed at addressing answers to questions such as “why an event occurs?” or “what should be done?” given the collected data. 
    more » « less
  2. Actions’ play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform ‘Reasoning about Actions & Change’ (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP dataset. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability. 
    more » « less
  3. Task and motion planning represents a powerful set of hybrid planning methods that combine reasoning over discrete task domains and continuous motion generation. Traditional reasoning necessitates task domain models and enough information to ground actions to motion planning queries. Gaps in this knowledge often arise from sources like occlusion or imprecise modeling. This work generates task and motion plans that include actions cannot be fully grounded at planning time. During execution, such an action is handled by a provided human designed or learned closed-loop behavior. Execution combines offline planned motions and online behaviors till reaching the task goal. Failures of behaviors are fed back as constraints to find new plans. Forty real-robot trials and motivating demonstrations are performed to evaluate the proposed framework and compare against state-of-the-art. Results show faster execution time, less number of actions, and more success in problems where diverse gaps arise. The experiment data is shared for researchers to simulate these settings. The work shows promise in expanding the applicable class of realistic partially grounded problems that robots can address. 
    more » « less
  4. Despite the growing interest in human-AI decision making, experimental studies with domain experts remain rare, largely due to the complexity of working with domain experts and the challenges in setting up realistic experiments. In this work, we conduct an in-depth collaboration with radiologists in prostate cancer diagnosis based on MRI images. Building on existing tools for teaching prostate cancer diagnosis, we develop an interface and conduct two experiments to study how AI assistance and performance feedback shape the decision making of domain experts. In Study 1, clinicians were asked to provide an initial diagnosis (human), then view the AI's prediction, and subsequently finalize their decision (human-AI team). In Study 2 (after a memory wash-out period), the same participants first received aggregated performance statistics from Study 1, specifically their own performance, the AI's performance, and their human-AI team performance, and then directly viewed the AI's prediction before making their diagnosis (i.e., no independent initial diagnosis). These two workflows represent realistic ways that clinical AI tools might be used in practice, where the second study simulates a scenario where doctors can adjust their reliance and trust on AI based on prior performance feedback. Our findings show that, while human-AI teams consistently outperform humans alone, they still underperform the AI due to under-reliance, similar to prior studies with crowdworkers. Providing clinicians with performance feedback did not significantly improve the performance of human-AI teams, although showing AI decisions in advance nudges people to follow AI more. Meanwhile, we observe that the ensemble of human-AI teams can outperform AI alone, suggesting promising directions for human-AI collaboration. 
    more » « less
  5. Large Language Models (LLMs) have made significant strides in various intelligent tasks but still struggle with complex action reasoning tasks that require systematic search. To address this limitation, we introduce a method that bridges the natural language understanding capability of LLMs with the symbolic reasoning capability of action languages---formal languages for reasoning about actions. Our approach, termed {\sf LLM+AL}, leverages the LLM's strengths in semantic parsing and commonsense knowledge generation alongside the action language's expertise in automated reasoning based on encoded knowledge. We compare {\sf LLM+AL} against state-of-the-art LLMs, including {\sc ChatGPT-4}, {\sc Claude 3 Opus}, {\sc Gemini Ultra 1.0}, and {\sc o1-preview}, using benchmarks for complex reasoning about actions. Our findings indicate that while all methods exhibit various errors, {\sf LLM+AL}, with relatively simple human corrections, consistently leads to correct answers, whereas using LLMs alone does not yield improvements even after human intervention. {\sf LLM+AL} also contributes to automated generation of action languages. 
    more » « less