Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes inflexible, failing to optimize or satisfy application requirements such as load balance, failure tolerance, parallelism, and network/system performance. This work presents a novel solution to achieve the best of two worlds: flexibility while maintaining cost-effectiveness and scalability. The proposed method Smash is an object placement and lookup method that achieves full placement flexibility, balanced load, low resource cost, and short latency. Smash utilizes a recent space-efficient data structure and applies it to object-location lookups. We implement Smash as a prototype system and evaluate it in a public cloud. The analysis and experimental results show that Smash achieves full placement flexibility, fast storage operations, fast recovery from node dynamics, and lower DRAM cost (<60%) compared to existing hash-based solutions such as Ceph and MapX.
more »
« less
The Senators Problem: A Design Space of Node Placement Methods for Geospatial Network Visualization (Short Paper)
Geographic network visualizations often require assigning nodes to geographic coordinates, but this can be challenging when precise node locations are undefined. We explore this problem using U.S. senators as a case study. Each state has two senators, and thus it is difficult to assign clear individual locations. We devise eight different node placement strategies ranging from geometric approaches such as state centroids and longest axis midpoints to data-driven methods using population centers and home office locations. Through expert evaluation, we found that specific coordinates such as senators’ office locations and state centroids are preferred strategies, while random placements and the longest axis method are least favored. The findings also highlight the importance of aligning node placement with research goals and avoiding potentially misleading encodings. This paper contributes to future advancements in geospatial network visualization software development and aims to facilitate more effective exploratory spatial data analysis.
more »
« less
- Award ID(s):
- 2045271
- PAR ID:
- 10616849
- Editor(s):
- Adams, Benjamin; Griffin, Amy L; Scheider, Simon; McKenzie, Grant
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Volume:
- 315
- ISSN:
- 1868-8969
- ISBN:
- 978-3-95977-330-0
- Page Range / eLocation ID:
- 19:1-19:9
- Subject(s) / Keyword(s):
- Spatial networks Political networks Social networks Geovisualization Node placement Human-centered computing → Geographic visualization Human-centered computing → Graph drawings
- Format(s):
- Medium: X Size: 9 pages; 7125721 bytes Other: application/pdf
- Size(s):
- 9 pages 7125721 bytes
- Right(s):
- Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Network cache allocation and management are important aspects of an Information-Centric Network (ICN) design, such as one based on Named Data Networking (NDN). We address the problem of optimal cache size allocation and content placement in an ICN in order to maximize the caching gain resulting from routing cost savings. While prior art assumes a given cache size at each network node and focuses on content placement, we study the problem when a global, network-wide cache storage budget is given and we solve for the optimal per-node cache allocation. This problem arises in cloud-based network settings where each network node is virtualized and housed within a cloud data center node with associated dynamic storage resources acquired from the cloud node as needed. As the offline centralized version of the optimal cache allocation problem is NP-hard, we develop a distributed adaptive algorithm that provides an approximate solution within a constant factor from the optimal. Performance evaluation of the algorithm is carried out through extensive simulations over multiple network topologies, demonstrating that our proposal significantly outperforms existing cache allocation algorithms.more » « less
-
Jihe Wang, Yi He (Ed.)Graph neural networks (GNN) are a powerful tool for combining imaging and non-imaging medical information for node classification tasks. Cross-network node classification extends GNN techniques to account for domain drift, allowing for node classification on an unlabeled target network. In this paper we present OTGCN, a powerful, novel approach to cross-network node classification. This approach leans on concepts from graph convolutional networks to harness insights from graph data structures while simultaneously applying strategies rooted in optimal transport to correct for the domain drift that can occur between samples from different data collection sites. This blended approach provides a practical solution for scenarios with many distinct forms of data collected across different locations and equipment. We demonstrate the effectiveness of this approach at classifying Autism Spectrum Disorder subjects using a blend of imaging and non-imaging data.more » « less
-
Geomasking traditionally refers to a set of techniques employed by a data steward to protect the privacy of data subjects by altering geographic coordinates. Data subjects themselves may make efforts to obfuscate their location data and protect their geoprivacy. Among these individual-level strategies are providing incorrect address data, limiting the precision of address data, or map-based location masking. This study examines the prevalence of these three location-masking behaviors in an online survey of California residents, finding that such behavior takes place across social groups. There are no significant differences across income level, education, ethnicity, sex, and urban locations. Instead, the primary differences are linked to intervening variables of knowledge and attitudes about location privacy.more » « less
-
Distributed cloud environments running data-intensive applications often slow down because of network congestion, uneven bandwidth, and data shuffling between nodes. Traditional host metrics such as CPU or memory do not capture these factors. Scheduling without considering network conditions causes poor placement, longer data transfers, and weaker job performance. This work presents a network-aware job scheduler that uses supervised learning to predict job completion time. The system collects real-time telemetry from all nodes, uses a trained model to estimate how long a job would take on each node, and ranks nodes to choose the best placement. The scheduler is evaluated on a geo-distributed Kubernetes cluster on the FABRIC testbed using network-intensive Spark workloads. Compared to the default Kubernetes scheduler, which uses only current resource availability, the supervised scheduler shows 34–54% higher accuracy in selecting the optimal node. The contribution is the demonstration of supervised learning for real-time, network-aware job scheduling on a multi-site cluster.more » « less
An official website of the United States government

