Systems software today is composed of numerous modules and exhibits complex failure modes. Existing failure detectors focus on catching simple, complete failures and treat programs uniformly at the process level. In this paper, we argue that modern software needs intrinsic failure detectors that are tailored to individual systems and can detect anomalies within a process at finer granularity. We particularly advocate a notion of intrinsic software watchdogs and propose an abstraction for it. Among the different styles of watchdogs, we believe watchdogs that imitate the main program can provide the best combination of completeness, accuracy and localization for detecting gray failures. But, manually constructing such mimic-type watchdogs is challenging and time-consuming. To close this gap, we present an early exploration for automatically generating mimic-type watchdogs
more »
« less
Software-Defined Watchdog Timers for Cyber-Physical Systems
This paper introduces software-defined watchdogs, a programming model for handling faults that manifest as delayed or missing signals. The programming model is implemented as an extension to the polyglot coordination language Lingua Franca, where it acts as an eager deadline for delayed inputs. The technique is compared against hardware-defined watchdogs and software watchdogs in other reactive languages.
more »
« less
- Award ID(s):
- 2233769
- PAR ID:
- 10554367
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Embedded Systems Letters
- ISSN:
- 1943-0663
- Page Range / eLocation ID:
- 1 to 1
- Subject(s) / Keyword(s):
- fault-handler distributed systems cyberphysical systems scheduling reactive programming
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Probabilistic programming languages aid developers performing Bayesian inference. These languages provide programming constructs and tools for probabilistic modeling and automated inference. Prior work introduced a probabilistic programming language, ProbZelus, to extend probabilistic programming functionality to unbounded streams of data. This work demonstrated that the delayed sampling inference algorithm could be extended to work in a streaming context. ProbZelus showed that while delayed sampling could be effectively deployed on some programs, depending on the probabilistic model under consideration, delayed sampling is not guaranteed to use a bounded amount of memory over the course of the execution of the program. In this paper, we the present conditions on a probabilistic program’s execution under which delayed sampling will execute in bounded memory. The two conditions are dataflow properties of the core operations of delayed sampling: the m -consumed property and the unseparated paths property . A program executes in bounded memory under delayed sampling if, and only if, it satisfies the m -consumed and unseparated paths properties. We propose a static analysis that abstracts over these properties to soundly ensure that any program that passes the analysis satisfies these properties, and thus executes in bounded memory under delayed sampling.more » « less
-
Ensuring software security is a critical task for a deliverable software system in today’s world, and its proper implementation guarantees the quality and security of the information ingested, stored, and processed by the system. It is imperative to introduce computer science and computer engineering students (CS/CE) with the secure software design practices early in their curriculum. This approach will help them understand fundamentals of secure programming, vulnerabilities in software systems, and secure software development before joining the industry workforce. In this paper, we propose an educational framework that integrates software security concepts in a software engineering design course. We envision that the framework will engage CS/CE students applying security principles and practices in different phases of the software development life cycle (SDLC) process. Our work focuses on review of common security requirements, policies, and mechanisms related to specific use cases as well as how those requirements are defined during the software design.more » « less
-
Partial failures occur frequently in cloud systems and can cause serious damage including inconsistency and data loss. Unfortunately, these failures are not well understood. Nor can they be effectively detected. In this paper, we first study 100 real-world partial failures from five mature systems to understand their characteristics. We find that these failures are caused by a variety of defects that require the unique conditions of the production environment to be triggered. Manually writing effective detectors to systematically detect such failures is both time-consuming and error-prone. We thus propose OmegaGen, a static analysis tool that automatically generates customized watchdogs for a given program by using a novel program reduction technique. We have successfully applied OmegaGen to six large distributed systems. In evaluating 22 real-world partial failure cases in these systems, the generated watchdogs can detect 20 cases with a median detection time of 4.2 seconds, and pinpoint the failure scope for 18 cases. The generated watchdogs also expose an unknown, confirmed partial failure bug in the latest version of ZooKeeper.more » « less
-
Abstract Background The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances.more » « less
An official website of the United States government

