skip to main content


Title: Multiple Resource Network Voronoi Diagram
Given a spatial network and a set of service centers from k different resource types, a Multiple Resource Network Voronoi Diagram (MRNVD) partitions the spatial network into a set of Service Areas that can minimize the total cycle-distances of graph-nodes to allotted k service centers with different resource types. The MRNVD problem is important for critical societal applications such as assigning essential survival supplies (e.g., food, water, gas, and medical assistance) to residents impacted by man-made or natural disasters. The MRNVD problem is NP-hard; it is computationally challenging due to the large size of the transportation network. Previous work proposed the Distance bounded Pruning (DP) approach to produce an optimal solution for MRNVD. However, we found that DP can be generalized to reduce the computational cost for the minimum cycle-distance. In this paper, we extend our prior work and propose a novel approach that reduces the computational cost. Experiments using real-world datasets from five different regions demonstrate that the proposed approach creates MRNVD and significantly reduces the computational cost.  more » « less
Award ID(s):
1844565
NSF-PAR ID:
10310805
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
ISSN:
1041-4347
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the resource-rich environment of data centers most failures can quickly failover to redundant resources. In contrast, failure in edge infrastructures with limited resources might require maintenance personnel to drive to the location in order to fix the problem. The operational cost of these "truck rolls" to locations at the edge infrastructure competes with the operational cost incurred by extra space and power needed for redundant resources at the edge. Computational storage devices with network interfaces can act as network-attached storage servers and offer a new design point for storage systems at the edge. In this paper we hypothesize that a system consisting of a larger number of such small "embedded" storage nodes provides higher availability due to a larger number of failure domains while also saving operational cost in terms of space and power. As evidence for our hypothesis, we compared the possibility of data loss between two different types of storage systems: one is constructed with general-purpose servers, and the other one is constructed with embedded storage nodes. Our results show that the storage system constructed with general-purpose servers has 7 to 20 times higher risk of losing data over the storage system constructed with embedded storage devices. We also compare the two alternatives in terms of power and space using the Media-Based Work Unit (MBWU) that we developed in an earlier paper as a reference point. 
    more » « less
  2. null (Ed.)
    Fast networks and the desire for high resource utilization in data centers and the cloud have driven disaggregation. Application compute is separated from storage, but this leads to high overheads when data must move over the network for simple operations on it. Alternatively, systems could allow applications to run application logic within storage via user-defined functions. Unfortunately, this ties provisioning and utilization of storage and compute resources together again. We present a new approach to executing storage-level functions in an in-memory key-value store that avoids this problem by dynamically deciding where to execute functions over data. Users write storage functions that are logically decoupled from storage, but storage servers choose where to run invocations of these functions physically. By using a server-internal cost model and observing function execution, servers choose to directly run inexpensive functions, while preferring to execute functions with high CPU-cost at client machines. We show that with this approach storage servers can reduce network request processing costs, avoid server compute bottlenecks, and improve aggregate storage system throughput. We realize our approach on an in-memory key-value store that executes 3.2 million strict serializable user-defined storage functions per second with 100 us response times. When running a mix of logic from different applications, it provides throughput better than running that logic purely at storage servers (85% more) or purely at clients (10% more). For our workloads, it also reduces latency (up to 2x) and transactional aborts (up to 33%) over pure client-side execution. 
    more » « less
  3. Thep‐dispersion problem is a spatial optimization problem that aims to maximize the minimum separation distance among all assigned nodes. This problem is characterized by an innate spatial structure based on distance attributes. This research proposes a novel approach, named thedistance‐based spatially informed property(D‐SIP) method to reduce the problem size of thep‐dispersion instances, facilitating a more efficient solution while maintaining optimality in nearly all cases. The D‐SIP is derived from investigating the underlying spatial characteristics from the behaviors of thep‐dispersion problem in determining the optimal location of nodes. To define the D‐SIP, this research applies Ripley'sK‐function to the different types of point patterns, given that the optimal solutions of thep‐dispersion problem are strongly associated with the spatial proximity among points discovered by Ripley'sK‐function. The results demonstrate that the D‐SIP identifies collective dominances of optimal solutions, leading to buildingthe spatially informed p‐dispersion model. The simulation‐based experiments show that the proposed method significantly diminishes the size of problems, improves computational performance, and secures optimal solutions for 99.9% of instances (999 out of 1,000 instances) under diverse conditions.

     
    more » « less
  4. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The goal of this work was to design a low-cost computing facility that can support the development of an open source digital pathology corpus containing 1M images [1]. A single image from a clinical-grade digital pathology scanner can range in size from hundreds of megabytes to five gigabytes. A 1M image database requires over a petabyte (PB) of disk space. To do meaningful work in this problem space requires a significant allocation of computing resources. The improvements and expansions to our HPC (highperformance computing) cluster, known as Neuronix [2], required to support working with digital pathology fall into two broad categories: computation and storage. To handle the increased computational burden and increase job throughput, we are using Slurm [3] as our scheduler and resource manager. For storage, we have designed and implemented a multi-layer filesystem architecture to distribute a filesystem across multiple machines. These enhancements, which are entirely based on open source software, have extended the capabilities of our cluster and increased its cost-effectiveness. Slurm has numerous features that allow it to generalize to a number of different scenarios. Among the most notable is its support for GPU (graphics processing unit) scheduling. GPUs can offer a tremendous performance increase in machine learning applications [4] and Slurm’s built-in mechanisms for handling them was a key factor in making this choice. Slurm has a general resource (GRES) mechanism that can be used to configure and enable support for resources beyond the ones provided by the traditional HPC scheduler (e.g. memory, wall-clock time), and GPUs are among the GRES types that can be supported by Slurm [5]. In addition to being able to track resources, Slurm does strict enforcement of resource allocation. This becomes very important as the computational demands of the jobs increase, so that they have all the resources they need, and that they don’t take resources from other jobs. It is a common practice among GPU-enabled frameworks to query the CUDA runtime library/drivers and iterate over the list of GPUs, attempting to establish a context on all of them. Slurm is able to affect the hardware discovery process of these jobs, which enables a number of these jobs to run alongside each other, even if the GPUs are in exclusive-process mode. To store large quantities of digital pathology slides, we developed a robust, extensible distributed storage solution. We utilized a number of open source tools to create a single filesystem, which can be mounted by any machine on the network. At the lowest layer of abstraction are the hard drives, which were split into 4 60-disk chassis, using 8TB drives. To support these disks, we have two server units, each equipped with Intel Xeon CPUs and 128GB of RAM. At the filesystem level, we have implemented a multi-layer solution that: (1) connects the disks together into a single filesystem/mountpoint using the ZFS (Zettabyte File System) [6], and (2) connects filesystems on multiple machines together to form a single mountpoint using Gluster [7]. ZFS, initially developed by Sun Microsystems, provides disk-level awareness and a filesystem which takes advantage of that awareness to provide fault tolerance. At the filesystem level, ZFS protects against data corruption and the infamous RAID write-hole bug by implementing a journaling scheme (the ZFS intent log, or ZIL) and copy-on-write functionality. Each machine (1 controller + 2 disk chassis) has its own separate ZFS filesystem. Gluster, essentially a meta-filesystem, takes each of these, and provides the means to connect them together over the network and using distributed (similar to RAID 0 but without striping individual files), and mirrored (similar to RAID 1) configurations [8]. By implementing these improvements, it has been possible to expand the storage and computational power of the Neuronix cluster arbitrarily to support the most computationally-intensive endeavors by scaling horizontally. We have greatly improved the scalability of the cluster while maintaining its excellent price/performance ratio [1]. 
    more » « less
  5. Cache-aided wireless device-to-device (D2D) networks allow significant throughput increase, depending on the concentration of the popularity distribution of files. Many studies assume that all users have the same preference distribution; however, this may not be true in practice. This work investigates whether and how the information about individual preferences can benefit cache-aided D2D networks. We examine a clustered network and derive a network utility that considers both the user distribution and channel fading effects into the analysis. We also formulate a utility maximization problem for designing caching policies. This maximization problem can be applied to optimize several important quantities, including throughput, energy efficiency (EE), cost, and hit-rate, and to solve different tradeoff problems. We provide a general approach that can solve the proposed problem under the assumption that users coordinate, then prove that the proposed approach can obtain the stationary point under a mild assumption. Using simulations of practical setups, we show that performance can improve significantly with proper exploitation of individual preferences. We also show that different types of tradeoffs exist between different performance metrics and that they can be managed through caching policy and cooperation distance designs. 
    more » « less