In the past decade, GPUs have become an important resource for compute-intensive, general-purpose GPU applications such as machine learning, big data analysis, and large-scale simulations. In the future, with the explosion of machine learning and big data, application demands will keep increasing, resulting in more data and computation being pushed to GPUs. However, due to the slowing of Moore’s Law and rising manufacturing costs, it is becoming more and more challenging to add compute resources into a single GPU device to improve its throughput. As a result, spreading work across multiple GPUs is popular in data-centric and scientific applications. For example, Facebook uses 8 GPUs per server in their recent machine learning platform. However, research infrastructure has not kept pace with this trend: most GPU hardware simulators, including gem5, only support a single GPU. Thus, it is hard to study interference between GPUs, communication between GPUs, or work scheduling across GPUs. Our research group has been working to address this shortcoming by adding multi-GPU support to gem5. Here, we discuss the changes that were needed, which included updating the emulated driver, GPU components, and coherence protocol.
more »
« less
NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.
The Neuroscience domain stands out from the field of sciences for its dependence on the study and characterization of complex, intertwining structures. Understanding the complexity of the brain has led to widespread advances in the structure of large-scale computing resources and the design of artificially intelligent analysis systems. However, the scale of problems and data generated continues to grow and outpace the standards and practices of neuroscience. In this paper, we present an automated neuroscience reconstruction framework, called NeuroKube, for large-scale processing and labeling of neuroimage volumes. Automated labels are generated through a machine-learning (ML) workflow, with data-intensive steps feeding through multiple GPU stages and distributed data locations leveraging autoscalable cloud-native deployments on a multi-institution Kubernetes system. Leading-edge hardwareand storage empower multiple stages of machine-learning, GPU accelerated solutions. This demonstrates an abstract approach to allocating the resources and algorithms needed to elucidate the highly complex structures of the brain. We summarize an integrated gateway architecture, and a scalable workflowdriven segmentation and reconstruction environment that brings together image big data with state-of-the-art, extensible machinelearning methods.
more »
« less
- Award ID(s):
- 2014862
- PAR ID:
- 10251125
- Date Published:
- Journal Name:
- 2020 IEEE International Conference on Big Data
- Page Range / eLocation ID:
- 320 to 330
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Understanding the brain requires studying its multiscale interactions from molecules to networks. The increasing availability of large-scale datasets detailing brain circuit composition, connectivity, and activity is transforming neuroscience. However, integrating and interpreting this data remains challenging. Concurrently, advances in supercomputing and sophisticated modeling tools now enable the development of highly detailed, large-scale biophysical circuit models. These mechanistic multiscale models offer a method to systematically integrate experimental data, facilitating investigations into brain structure, function, and disease. This review, based on a Society for Neuroscience 2024 MiniSymposium, aims to disseminate recent advances in large-scale mechanistic modeling to the broader community. It highlights (1) examples of current models for various brain regions developed through experimental data integration; (2) their predictive capabilities regarding cellular and circuit mechanisms underlying experimental recordings (e.g., membrane voltage, spikes, local-field potential, electroencephalography/magnetoencephalography) and brain function; and (3) their use in simulating biomarkers for brain diseases like epilepsy, depression, schizophrenia, and Parkinson's, aiding in understanding their biophysical underpinnings and developing novel treatments. The review showcases state-of-the-art models covering hippocampus, somatosensory, visual, motor, auditory cortical, and thalamic circuits across species. These models predict neural activity at multiple scales and provide insights into the biophysical mechanisms underlying sensation, motor behavior, brain signals, neural coding, disease, pharmacological interventions, and neural stimulation. Collaboration with experimental neuroscientists and clinicians is essential for the development and validation of these models, particularly as datasets grow. Hence, this review aims to foster interest in detailed brain circuit models, leading to cross-disciplinary collaborations that accelerate brain research.more » « less
-
Obeid, Iyad; Selesnick, Ivan; Picone, Joseph (Ed.)Scalp electroencephalograms (EEGs) are the primary means by which phy-sicians diagnose brain-related illnesses such as epilepsy and seizures. Au-tomated seizure detection using clinical EEGs is a very difficult machine learning problem due to the low fidelity of a scalp EEG signal. Neverthe-less, despite the poor signal quality, clinicians can reliably diagnose ill-nesses from visual inspection of the signal waveform. Commercially avail-able automated seizure detection systems, however, suffer from unaccepta-bly high false alarm rates. Deep learning algorithms that require large amounts of training data have not previously been effective on this task due to the lack of big data resources necessary for building such models and the complexity of the signals involved. The evolution of big data science, most notably the release of the Temple University EEG (TUEG) Corpus, has mo-tivated renewed interest in this problem. In this chapter, we discuss the application of a variety of deep learning ar-chitectures to automated seizure detection. Architectures explored include multilayer perceptrons, convolutional neural networks (CNNs), long short-term memory networks (LSTMs), gated recurrent units and residual neural networks. We use the TUEG Corpus, supplemented with data from Duke University, to evaluate the performance of these hybrid deep structures. Since TUEG contains a significant amount of unlabeled data, we also dis-cuss unsupervised pre-training methods used prior to training these com-plex recurrent networks. Exploiting spatial and temporal context is critical for accurate disambigua-tion of seizures from artifacts. We explore how effectively several conven-tional architectures are able to model context and introduce a hybrid system that integrates CNNs and LSTMs. The primary error modalities observed by this state-of-the-art system were false alarms generated during brief delta range slowing patterns such as intermittent rhythmic delta activity. A varie-ty of these types of events have been observed during inter-ictal and post-ictal stages. Training models on such events with diverse morphologies has the potential to significantly reduce the remaining false alarms. This is one reason we are continuing our efforts to annotate a larger portion of TUEG. Increasing the data set size significantly allows us to leverage more ad-vanced machine learning methodologies.more » « less
-
Abstract Automotive structures are primarily made of flexible sheet metal assemblies. Flexible assemblies are prone to manufacturing variations like springback which may be caused due to non-isotropic material properties from cold rolling, springback in the forming process, and distortion from residual stresses when components are clamped, and spot welded. This paper describes the curation of a large data set for machine learning. The domain is that of flexible assembly manufacturing in multi stages: component stamping, configuring components into sub-assemblies, clamping and joining. The dataset is generated by nonlinear FEA. Due to the size of the data set, the simulation workflow has been automated and designed to produce variety and balance of key parameters. Simulation results are available not just as raw FE deformed (sprung back) geometries and residual stresses at different manufacturing stages, but also in the form of variation zones and fits. The NUMISHEET 1993 U-draw/bending was used a reference for tooling geometry and verification of the forming process. Additional variation in the dataset is obtained by using multiple materials and geometrical dimensions. In summary, the proposed simulation method provides a means of generating a design space of flexible multi-part assemblies for applications such as dataset generation, design optimization, and machine learning.more » « less
-
Deep Neural Networks (DNNs) have been applied as an effective machine learning algorithm to tackle problems in different domains. However, the endeavor to train sophisticated DNN models can stretch from days into weeks, presenting substantial obstacles in the realm of research focused on large-scale DNN architectures. Distributed Deep Learning (DDL) contributes to accelerating DNN training by distributing training workloads across multiple computation accelerators, for example, graphics processing units (GPUs). Despite the considerable amount of research directed toward enhancing DDL training, the influence of data loading on GPU utilization and overall training efficacy remains relatively overlooked. It is non-trivial to optimize data-loading in DDL applications that need intensive central processing unit (CPU) and input/output (I/O) resources to process enormous training data. When multiple DDL applications are deployed on a system (e.g., Cloud and High-Performance Computing (HPC) system), the lack of a practical and efficient technique for data-loader allocation incurs GPU idleness and degrades the training throughput. Therefore, our work first focuses on investigating the impact of data-loading on the global training throughput. We then propose a throughput prediction model to predict the maximum throughput for an individual DDL training application. By leveraging the predicted results, A-Dloader is designed to dynamically allocate CPU and I/O resources to concurrently running DDL applications and use the data-loader allocation as a knob to reduce GPU idle intervals and thus improve the overall training throughput. We implement and evaluate A-Dloader in a DDL framework for a series of DDL applications arriving and completing across the runtime. Our experimental results show that A-Dloader can achieve a 28.9% throughput improvement and a 10% makespan improvement compared with allocating resources evenly across applications.more » « less
An official website of the United States government

