The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high performance overhead. NVIDIA Unified Virtual Memory (UVM) is currently the primary real-world implementation of such abstraction and offers a functionally equivalent testbed for in-depth performance study for both UVM and future Linux Heterogeneous Memory Management (HMM) compatible systems. The continued advocacy for UVM and HMM motivates improvement of the underlying system. We focus on UVM-based systems and investigate the root causes of UVM overhead, a non-trivial task due to complex interactions of multiple hardware and software constituents and the desired cost granularity. In our prior work, we delved deeply into UVM system architecture and showed internal behaviors of page fault servicing in batches. We provided quantitative evaluation of batch handling for various applications under different scenarios, including prefetching and oversubscription. We revealed that the driver workload depends on the interactions among application access patterns, GPU hardware constraints, and host OS components. Host OS components have significant overhead present across implementations, warranting close attention. This extension furthers our prior study in three aspects: fine-grain cost analysis and breakdown, extension to multiple GPUs, and investigation of platforms with different GPU-GPU interconnects. We take a top-down approach to quantitative batch analysis and uncover how constituent component costs accumulate and overlap, governed by synchronous and asynchronous operations. Our multi-GPU analysis shows reduced cost of GPU-GPU batch workloads compared to CPU-GPU workloads. We further demonstrate that while specialized interconnects, NVLink, can improve batch cost, their benefits are limited by host OS software overhead and GPU oversubscription. This study serves as a proxy for future shared memory systems, such as those that interface with HMM, and the development of interconnects.
more »
« less
BeaCloud: A Generic Architecture for Sustainable Smart City using Bluetooth Beacons
In recent years, Bluetooth beacons have been widely used in numerous application domains, including smart cities, assistive technologies, and intelligent transportation management. Researchers or developers associated with these domains frequently require diverse systems to implement or test their prototype related innovations. They need to deploy beacons for the specific environment every time; such customized systems typically cannot often be reused. Hence, the cost of implementation increases, and multiple systems generate redundant data. In this paper, we propose BeaCloud - an architecture which provides a common platform of multiple beacon-based systems. BeaCloud enables inter-system communication and allows easy and secure access to data for the system administrators. The proposed architecture presents a cost-effective model that offers reduced cost and hardware. Also, BeaCloud reduces the data redundancy by up to 40%. To demonstrate the feasibility of BeaCloud, we implemented a testbed of three testing sites and evaluated the system's performance.
more »
« less
- Award ID(s):
- 1952090
- PAR ID:
- 10282332
- Date Published:
- Journal Name:
- IEEE International Conference on High Performance Computing and Communications (HPCC)
- Page Range / eLocation ID:
- 1150 to 1157
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface.more » « less
-
Research in the area of internet-of-things, cyber physical- systems, and smart health often employ sensor systems at residences for continuous monitoring. Such research oriented residential monitoring systems (RRMSs) usually face two major challenges, long-term reliable operation management and validation of system functionality with minimal human effort. Targeting these two challenges, this paper describes a monitor of monitoring systems with ground-truth validation capabilities, M2G. It consists of two subsystems, the Monitor2 system and the Ground-truth validation system. The Monitor2 system encapsulates a flexible set of general-purpose components to monitor the operation and connectivity of heterogeneous sensor devices (e.g. smart watches, smart phones, microphones, beacons, etc.), a local base-station, as well as a cloud server. It provides a user-friendly interface and supports different types of RRMSs in various contexts. The system also features a ground truth validation system to support obtaining ground truth in the field. Additionally, customized alerts can be sent to remote administrators and other personnel to report any dysfunction or inaccuracy of the system in real time. M2G is applied to three very different case studies: the M2FED system which monitors family eating dynamics, an in-home wireless sensing system for monitoring nighttime agitation, and the BESI system which monitors behavioral and environmental parameters to predict health events and to provide interventions. The results indicate that M2G is a comprehensive system that (i) requires small cost in time and effort to adapt to an existing RRMS, (ii) provides reliable data collection and reduction in data loss by detecting faults in real-time, and (iii) provides a convenient and timely ground truth validation facility.more » « less
-
Friedberg, Iddo (Ed.)The Immunoglobulin fold (Ig-fold) is found in proteins from all domains of life and represents the most populous fold in the human genome, with current estimates ranging from 2 to 3% of protein coding regions. That proportion is much higher in the surfaceome where Ig and Ig-like domains orchestrate cell-cell recognition, adhesion and signaling. The ability of Ig-domains to reliably fold and self-assemble through highly specific interfaces represents a remarkable property of these domains, making them key elements of molecular interaction systems: the immune system, the nervous system, the vascular system and the muscular system. We define a universal residue numbering scheme, common to all domains sharing the Ig-fold in order to study the wide spectrum of Ig-domain variants constituting the Ig-proteome and Ig-Ig interactomes at the heart of thesesystems. The “IgStrand numbering scheme” enables the identification of Ig structural proteomes and interactomes in and between any species, and comparative structural, functional, and evolutionary analyses. We review how Ig-domains are classified today as topological and structural variants and highlight the“Ig-fold irreducible structural signature”shared by all of them. The IgStrand numbering scheme lays the foundation for the systematic annotation of structural proteomes by detecting and accurately labeling Ig-, Ig-like and Ig-extended domains in proteins, which are poorly annotated in current databases and opens the door to accurate machine learning. Importantly, it sheds light on the robustIg protein folding algorithmused by nature to form beta sandwich supersecondary structures. The numbering scheme powers an algorithm implemented in the interactive structural analysis software iCn3D to systematically recognize Ig-domains, annotate them and perform detailed analyses comparing any domain sharing the Ig-fold in sequence, topology and structure, regardless of their diverse topologies or origin. The scheme provides a robust fold detection and labeling mechanism that reveals unsuspected structural homologies among protein structures beyond currently identified Ig- and Ig-like domain variants. Indeed, multiple folds classified independently contain a common structural signature, in particular jelly-rolls. Examples of folds that harbor an “Ig-extended” architecture are given. Applications in protein engineering around the Ig-architecture are straightforward based on the universal numbering.more » « less
-
Indoor localization systems typically determine a position using either ranging measurements, inertial sensors, environmental-specific signatures or some combination of all of these methods. Given a floor plan, inertial and signature-based systems can converge on accurate locations by slowly pruning away inconsistent states as a user walks through the space. In contrast, range-based systems are capable of instantly acquiring locations, but they rely on densely deployed beacons and suffer from inaccurate range measurements given non-line-of-sight (NLOS) signals. In order to get the best of both worlds, we present an approach that systematically exploits the geometry information derived from building floor plans to directly improve location acquisition in range-based systems. Our solving approach can disambiguate multiple feasible locations taking into account a mix of LOS and NLOS hypotheses to accurately localize with significantly fewer beacons. We demonstrate our geometry-aware solving approach using a new ultrasonic beacon platform that is able to perform direct time-of-flight ranges on commodity smartphones. The platform uses Bluetooth Low Energy (BLE) for time synchronization and ultrasound for measuring propagation distance. We evaluate our system's accuracy with multiple deployments in a university campus and show that our approach shifts the 80% accuracy point from 4 -- 8m to 1m as compared to solvers that do not use the floor plan information. We are able to detect and remove NLOS signals with 91.5% accuracy.more » « less
An official website of the United States government

