The ability to determine whether a robot's grasp has a high chance of failing, before it actually does, can save significant time and avoid failures by planning for re-grasping or changing the strategy for that special case. Machine Learning (ML) offers one way to learn to predict grasp failure from historic data consisting of a robot's attempted grasps alongside labels of the success or failure. Unfortunately, most powerful ML models are black-box models that do not explain the reasons behind their predictions. In this paper, we investigate how ML can be used to predict robot grasp failure and study the tradeoff between accuracy and interpretability by comparing interpretable (white box) ML models that are inherently explainable with more accurate black box ML models that are inherently opaque. Our results show that one does not necessarily have to compromise accuracy for interpretability if we use an explanation generation method, such as Shapley Additive explanations (SHAP), to add explainability to the accurate predictions made by black box models. An explanation of a predicted fault can lead to an efficient choice of corrective action in the robot's design that can be taken to avoid future failures.
more »
« less
Failing to Grasp our Failure to Grasp Automation Failure
This paper discusses three points inspired by Skraaning and Jamieson’s perspective on automation failure: (a) the limitations of the automation failure concept with expanding system boundaries; (b) parallels between the failure to grasp automation failure and the failure to grasp trust in automation; (c) benefits of taking a pluralistic approach to definitions in sociotechnical systems science. While a taxonomy of automation-involved failures may not directly improve our understanding of how to prevent those failures, it could be instrumental for identifying hazards during test and evaluation of operational systems.
more »
« less
- Award ID(s):
- 2231874
- PAR ID:
- 10515583
- Publisher / Repository:
- Journal of Cognitive Engineering and Decision Making
- Date Published:
- Journal Name:
- Journal of Cognitive Engineering and Decision Making
- ISSN:
- 1555-3434
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Skraaning and Jamieson raise some interesting issues related to the response of humans to automation failures and offer a taxonomy of failure types that broadens its definition. In this commentary a further attempt to broaden the scope of automation failures is made that places failures within a sociotechnical system of multiple humans and multiple machine components including automation. A suggestion of how one might understand the system’s response to automation failures is offered and the inclusion of autonomy is raised as another complication.more » « less
-
When a failure occurs in production systems, the highest priority is to quickly mitigate it. Despite its importance, failure mitigation is done in a reactive and ad-hoc way: taking some fixed actions only after a severe symptom is observed. For cloud systems, such a strategy is inadequate. In this paper, we propose a preventive and adaptive failure mitigation service, Narya, that is integrated in a production cloud, Microsoft Azure's compute platform. Narya predicts imminent host failures based on multi-layer system signals and then decides smart mitigation actions. The goal is to avert VM failures. Narya's decision engine takes a novel online experimentation approach to continually explore the best mitigation action. Narya further enhances the adaptive decision capability through reinforcement learning. Narya has been running in production for 15 months. It on average reduces VM interruptions by 26% compared to the previous static strategy.more » « less
-
Large-scale distributed systems must be built to anticipate and mitigate a variety of hardware and software failures. In order to build confidence that fault-tolerant systems are correctly implemented, Netflix (and similar enterprises) regularly run failure drills in which faults are deliberately injected in their production system. The combinatorial space of failure scenarios is too large to explore exhaustively. Existing failure testing approaches either randomly explore the space of potential failures randomly or exploit the "hunches" of domain experts to guide the search. Random strategies waste resources testing "uninteresting" faults, while programmer-guided approaches are only as good as human intuition and only scale with human effort. In this paper, we describe how we adapted and implemented a research prototype called lineage-driven fault injection (LDFI) to automate failure testing at Netflix. Along the way, we describe the challenges that arose adapting the LDFI model to the complex and dynamic realities of the Netflix architecture. We show how we implemented the adapted algorithm as a service atop the existing tracing and fault injection infrastructure, and present early results.more » « less
-
Space mission-related projects are demanding and risky undertakings because of their complexity and cost. Many missions have failed over the years due to anomalies in either the launch vehicle or the spacecraft. Projects of such magnitude with undetected flaws due to ineffective process controls account for huge losses. Such failures continue to occur despite the studies on systems engineering process deficiencies and the state-of-the-art systems engineering practices in place. To further explore the reasons behind majority of the failures, we analyzed the failure data of space missions that happened over the last decade. Based on that information, we studied the launch-related failure events from a design decision-making perspective by employing failure event chain-based framework and identified some dominant cognitive biases that might have impacted the overall system performance leading to unintended catastrophes. The results of the study are presented in this paper.more » « less
An official website of the United States government

