Significant obstacles exist in scientific domains including genetics, climate modeling, and astronomy due to the management, preprocess, and training on complicated data for deep learning. Even while several large-scale solutions offer distributed execution environments, open-source alternatives that integrate scalable runtime tools, deep learning and data frameworks on high-performance computing platforms remain crucial for accessibility and flexibility. In this paper, we introduce Deep Radical-Cylon(RC), a heterogeneous runtime system that combines data engineering, deep learning frameworks, and workflow engines across several HPC environments, including cloud and supercomputing infrastructures. Deep RC supports heterogeneous systems with accelerators, allows the usage of communication libraries like MPI, GLOO and NCCL across multi-node setups, and facilitates parallel and distributed deep learning pipelines by utilizing Radical Pilot as a task execution framework. By attaining an end-to-end pipeline including preprocessing, model training, and postprocessing with 11 neural forecasting models (PyTorch) and hydrology models (TensorFlow) under identical resource conditions, the system reduces 3.28 and 75.9 seconds, respectively. The design of Deep RC guarantees the smooth integration of scalable data frameworks, such as Cylon, with deep learning processes, exhibiting strong performance on cloud platforms and scientific HPC systems. By offering a flexible, high-performance solution for resource-intensive applications, this method closes the gap between data preprocessing, model training, and postprocessing.
more »
« less
Paired Training Framework for Time-Constrained Learning
This paper presents a design framework for machine learning applications that operate in systems such as cyber-physical systems where time is a scarce resource. We manage the tradeoff between processing time and solution quality by performing as much preprocessing of data as time will allow. This approach leads us to a design framework in which there are two separate learning networks: one for preprocessing and one for the core application functionality. We show how these networks can be trained together and how they can operate in an anytime fashion to optimize performance.
more »
« less
- PAR ID:
- 10230439
- Date Published:
- Journal Name:
- Proceedings of the 2021 Design, Automation, and Test in Europe Conference & Exhibition (DATE'21)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Developers rarely build programming environments that help secondary teachers support student learning. We interviewed 11 K12 teachers to discover how they support students learning to program and how tools might assist their teaching practice. Based on thematic analysis and organizing teacher activities around student actions, we have derived a new framework that can be used to design a programming learning system to support teachers. Our results suggest that teachers structure their activities based on their ideals about effective programming teaching and learning, and student problem solving and help-seeking processes. Therefore, our framework relates the themes we discovered about teacher activities to ideals and student problem solving in a time-based framework that can inform the design for new programming learning systems.more » « less
-
Autonomous drones (UAVs) have rapidly grown in popularity due to their form factor, agility, and ability to operate in harsh or hostile environments. Drone systems come in various form factors and configurations and operate under tight physical parameters. Further, it has been a significant challenge for architects and researchers to develop optimal drone designs as open-source simulation frameworks either lack the necessary capabilities to simulate a full drone flight stack or they are extremely tedious to setup with little or no maintenance or support. In this paper, we develop and present UniUAVSim, our fully open-source co-simulation framework capable of running software-in-the-loop (SITL) and hardware-in-the-loop (HITL) simulations concurrently. The paper also provides insights into the abstraction of a drone flight stack and details how these abstractions aid in creating a simulation framework which can accurately provide an optimal drone design given physical parameters and constraints. The framework was validated with real-world hardware and is available to the research community to aid in future architecture research for autonomous systems.more » « less
-
A Raspberry Pi-Based Traumatic Brain Injury Detection System for Single-Channel ElectroencephalogramTraumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in the size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and portable systems for early detection of TBI. This work describes a Raspberry Pi based portable, real-time data acquisition, and automated processing system that uses machine learning to efficiently identify TBI and automatically score sleep stages from a single-channel Electroencephalogram (EEG) signal. We discuss the design, implementation, and verification of the system that can digitize the EEG signal using an Analog to Digital Converter (ADC) and perform real-time signal classification to detect the presence of mild TBI (mTBI). We utilize Convolutional Neural Networks (CNN) and XGBoost based predictive models to evaluate the performance and demonstrate the versatility of the system to operate with multiple types of predictive models. We achieve a peak classification accuracy of more than 90% with a classification time of less than 1 s across 16–64 s epochs for TBI vs. control conditions. This work can enable the development of systems suitable for field use without requiring specialized medical equipment for early TBI detection applications and TBI research. Further, this work opens avenues to implement connected, real-time TBI related health and wellness monitoring systems.more » « less
-
The objective of this paper is to propose a System-of- Systems (SoS) framework for disaster management systems and processes to better analyze, design and operate the heterogeneous, interconnected, and distributed systems involved in disasters. With increasing frequency and severity of disasters, improvement of efficiency and effectiveness of disaster management systems and processes is critical. However, the current approaches for conceptualization and analysis of disaster management processes do not provide a holistic perspective for analysis of multiple heterogeneous systems and processes that are interconnected and embedded in networks across various spatial and temporal scales. In this paper, a disaster management system-of-systems (DM-SoS) framework was proposed to identify the dimensions of analysis and characteristics towards a more integrative approach to disaster management. Three dimensions of analysis (definition, abstraction, and implementation) and their corresponding components for examining disaster management SoS are explored. The DM-SoS framework would enable specification and characterization of system attributes and interdependencies, as well as capturing emergent properties and cross-scale interactions.more » « less
An official website of the United States government

