Access libraries such as ROOT[1] and HDF5[2] allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. For example, access libraries often implement buffering and data layout that assume that large, single-threaded sequential access patterns are causing less overall latency than small parallel random access: while this is true for spinning media, it is not true for flash media. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph’s extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.
more »
« less
Scale-out Edge Storage Systems with Embedded Storage Nodes to Get Better Availability and Cost-Efficiency At the Same Time
In the resource-rich environment of data centers most failures can quickly failover to redundant resources. In contrast, failure in edge infrastructures with limited resources might require maintenance personnel to drive to the location in order to fix the problem. The operational cost of these "truck rolls" to locations at the edge infrastructure competes with the operational cost incurred by extra space and power needed for redundant resources at the edge. Computational storage devices with network interfaces can act as network-attached storage servers and offer a new design point for storage systems at the edge. In this paper we hypothesize that a system consisting of a larger number of such small "embedded" storage nodes provides higher availability due to a larger number of failure domains while also saving operational cost in terms of space and power. As evidence for our hypothesis, we compared the possibility of data loss between two different types of storage systems: one is constructed with general-purpose servers, and the other one is constructed with embedded storage nodes. Our results show that the storage system constructed with general-purpose servers has 7 to 20 times higher risk of losing data over the storage system constructed with embedded storage devices. We also compare the two alternatives in terms of power and space using the Media-Based Work Unit (MBWU) that we developed in an earlier paper as a reference point.
more »
« less
- PAR ID:
- 10275745
- Date Published:
- Journal Name:
- HotEdge'20
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This work seeks to quantify the benefits of using energy storage toward the reduction of the energy generation cost of a power system. A two-fold optimization framework is provided where the first optimization problem seeks to find the optimal storage schedule that minimizes operational costs. Since the operational cost depends on the storage capacity, a second optimization problem is then formulated with the aim of finding the optimal storage capacity to be deployed. Although, in general, these problems are difficult to solve, we provide a lower bound on the cost savings for a parametrized family of demand profiles. The optimization framework is numerically illustrated using real-world demand data from ISO New England. Numerical results show that energy storage can reduce energy generation costs by at least 2.5 percent.more » « less
-
null (Ed.)Fast networks and the desire for high resource utilization in data centers and the cloud have driven disaggregation. Application compute is separated from storage, but this leads to high overheads when data must move over the network for simple operations on it. Alternatively, systems could allow applications to run application logic within storage via user-defined functions. Unfortunately, this ties provisioning and utilization of storage and compute resources together again. We present a new approach to executing storage-level functions in an in-memory key-value store that avoids this problem by dynamically deciding where to execute functions over data. Users write storage functions that are logically decoupled from storage, but storage servers choose where to run invocations of these functions physically. By using a server-internal cost model and observing function execution, servers choose to directly run inexpensive functions, while preferring to execute functions with high CPU-cost at client machines. We show that with this approach storage servers can reduce network request processing costs, avoid server compute bottlenecks, and improve aggregate storage system throughput. We realize our approach on an in-memory key-value store that executes 3.2 million strict serializable user-defined storage functions per second with 100 us response times. When running a mix of logic from different applications, it provides throughput better than running that logic purely at storage servers (85% more) or purely at clients (10% more). For our workloads, it also reduces latency (up to 2x) and transactional aborts (up to 33%) over pure client-side execution.more » « less
-
Larger penetration of Distributed Generations (DG) in the power system brings new flexibility and opportunity as well as new challenges due to the generally intermittent nature of DG. When these DG are installed in the medium voltage distribution systems as components of the smart grid, further support is required to ensure a smooth and controllable operation. To complement the uncontrollable output power of these resources, energy storage devices need to be incorporated to absorb excessive power and provide power shortage in time of need. They also can provide reactive power to dynamically help the voltage profile. Energy Storage Systems (ESS) can be expensive and limited number of them can practically be installed in distribution systems. In addition to frequency regulation and energy time shifting, ESS can support voltage and angle stability in the power network. This paper applies a Jacobian matrix-based sensitivity analysis to determine the most appropriate node in a grid to collectively improve the voltage magnitude and angle of all the nodes by active/reactive power injection. IEEE 14, 24, and 123-bus distribution system are selected to demonstrate the performance of the proposed method. As opposed to most previous studies, this method does not require an iteration loop with a convergence problem nor a network-related complicated objective function.more » « less
-
Edge application’s distributed nature presents significant challenges for developers in orchestrating and managing the multitenant applications. In this paper, we propose a practical edge cloud software framework for deploying multitenant distributed smart applications. Here we exploit commodity, a low cost embedded board to form distributed edge clusters. The cluster of geo-distributed and wireless edge nodes not only power multitenant IoT applications that are closer to the data source and the user, but also enable developers to remotely deploy and orchestrate application containers over the cloud. Specifically, we propose building a software platform to manage the distributed edge nodes along with support services to deploy and launch isolated and multitenant user applications through a lightweight container. In particular, we propose an architectural solution to improve the resilience of edge cloud services through peer collaborated service migration when the failures happen or when resources are overburdened. We focus on giving the developers a single point control of the infrastructure over the intermittent and lossy wide area networks (WANs) and enabling the remote deployment of multitenant applications.more » « less
An official website of the United States government

