skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Query Processing in Highly Distributed Data Environments
This paper will demonstrate a novel method for consolidating data in an engineered hypercube network for the purpose of optimizing query processing. Query processing typically calls for merging data collected from a small subset of server nodes in a network. This poses the problem of managing efficiently the exchange of data between processing nodes to complete some relational data operation. The method developed here is designed to minimize data transfer, measured as the product of data quantity and network distance, by delegating the processing to a node that is relatively central to the subset. A hypercube not only supports simple computation of network distance between nodes, but also allows for identifying a node to serve as the center for any data consolidation operations.We will show how the consolidation process can be performed by selecting a subgraph of a complex network to simplify the selection of a central node and thus facilitate the computations required. We will also show a prototype implementation of a hypercube using Software-Defined Networking to support query optimization in a distributed heterogeneous database system, making use of network distance information and data quantity.  more » « less
Award ID(s):
1818884
PAR ID:
10289927
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
the 13th International Workshop on Information Network Design (WIND2021), Taiwan
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Donta, Praveen Kumar (Ed.)
    Federated Learning (FL), as a new computing framework, has received significant attentions recently due to its advantageous in preserving data privacy in training models with superb performance. During FL learning, distributed sites first learn respective parameters. A central site will consolidate learned parameters, using average or other approaches, and disseminate new weights across all sites to carryout next round of learning. The distributed parameter learning and consolidation repeat in an iterative fashion until the algorithm converges or terminates. Many FL methods exist to aggregate weights from distributed sites, but most approaches use a static node alignment approach, where nodes of distributed networks are statically assigned, in advance, to match nodes and aggregate their weights. In reality, neural networks, especially dense networks, have nontransparent roles with respect to individual nodes. Combined with random nature of the networks, static node matching often does not result in best matching between nodes across sites. In this paper, we propose, FedDNA, adynamic node alignmentfederated learning algorithm. Our theme is to find best matching nodes between different sites, and then aggregate weights of matching nodes for federated learning. For each node in a neural network, we represent its weight values as a vector, and use a distance function to find most similar nodes,i.e., nodes with the smallest distance from other sides. Because finding best matching across all sites are computationally expensive, we further design a minimum spanning tree based approach to ensure that a node from each site will have matched peers from other sites, such that the total pairwise distances across all sites are minimized. Experiments and comparisons demonstrate that FedDNA outperforms commonly used baseline, such as FedAvg, for federated learning. 
    more » « less
  2. null (Ed.)
    Network alignment finds node correspondences across multiple networks, where the alignment accuracy is of crucial importance because of its profound impact on downstream applications. The vast majority of existing works focus on how to best utilize the topology and attribute information of the input networks as well as the anchor links when available. Nonetheless, it has not been well studied on how to boost the alignment performance through actively obtaining high-quality and informative anchor links, with a few exceptions. The sparse literature on active network alignment introduces the human in the loop to label some seed node correspondence (i.e., anchor links), which are informative from the perspective of querying the most uncertain node given few potential matchings. However, the direct influence of the intrinsic network attribute information on the alignment results has largely remained unknown. In this paper, we tackle this challenge and propose an active network alignment method (Attent) to identify the best nodes to query. The key idea of the proposed method is to leverage effective and efficient influence functions defined over the alignment solution to evaluate the goodness of the candidate nodes for query. Our proposed query strategy bears three distinct advantages, including (1) effectiveness, being able to accurately quantify the influence of the candidate nodes on the alignment results; (2) efficiency, scaling linearly with 15 − 17× speed-up over the straightforward implementation without any quality loss; (3) generality, consistently improving alignment performance of a variety of network alignment algorithms. 
    more » « less
  3. The monitoring of data streams with a network structure have drawn increasing attention due to its wide applications in modern process control. In these applications, high-dimensional sensor nodes are interconnected with an underlying network topology. In such a case, abnormalities occurring to any node may propagate dynamically across the network and cause changes of other nodes over time. Furthermore, high dimensionality of such data significantly increased the cost of resources for data transmission and computation, such that only partial observations can be transmitted or processed in practice. Overall, how to quickly detect abnormalities in such large networks with resource constraints remains a challenge, especially due to the sampling uncertainty under the dynamic anomaly occurrences and network-based patterns. In this paper, we incorporate network structure information into the monitoring and adaptive sampling methodologies for quick anomaly detection in large networks where only partial observations are available. We develop a general monitoring and adaptive sampling method and further extend it to the case with memory constraints, both of which exploit network distance and centrality information for better process monitoring and identification of abnormalities. Theoretical investigations of the proposed methods demonstrate their sampling efficiency on balancing between exploration and exploitation, as well as the detection performance guarantee. Numerical simulations and a case study on power network have demonstrated the superiority of the proposed methods in detecting various types of shifts. Note to Practitioners —Continuous monitoring of networks for anomalous events is critical for a large number of applications involving power networks, computer networks, epidemiological surveillance, social networks, etc. This paper aims at addressing the challenges in monitoring large networks in cases where monitoring resources are limited such that only a subset of nodes in the network is observable. Specifically, we integrate network structure information of nodes for constructing sequential detection methods via effective data augmentation, and for designing adaptive sampling algorithms to observe suspicious nodes that are likely to be abnormal. Then, the method is further generalized to the case that the memory of the computation is also constrained due to the network size. The developed method is greatly beneficial and effective for various anomaly patterns, especially when the initial anomaly randomly occurs to nodes in the network. The proposed methods are demonstrated to be capable of quickly detecting changes in the network and dynamically changes the sampling priority based on online observations in various cases, as shown in the theoretical investigation, simulations and case studies. 
    more » « less
  4. In recent studies, researchers have developed various computation offloading frameworks for bringing cloud services closer to the user via edge networks. Specifically, an edge device needs to offload computationally intensive tasks because of energy and processing constraints. These constraints present the challenge of identifying which edge nodes should receive tasks to reduce overall resource consumption. We propose a unique solution to this problem which incorporates elements from Knowledge-Defined Networking (KDN) to make intelligent predictions about offloading costs based on historical data. Each server instance can be represented in a multidimensional feature space where each dimension corresponds to a predicted metric. We compute features for a "hyperprofile" and position nodes based on the predicted costs of offloading a particular task. We then perform a k-Nearest Neighbor (kNN) query within the hyperprofile to select nodes for offloading computation. This paper formalizes our hyperprofile-based solution and explores the viability of using machine learning (ML) techniques to predict metrics useful for computation offloading. We also investigate the effects of using different distance metrics for the queries. Our results show various network metrics can be modeled accurately with regression, and there are circumstances where kNN queries using Euclidean distance as opposed to rectilinear distance is more favorable. 
    more » « less
  5. The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced between them when a unit current is injected at the first node and extracted at the second node, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications and fields including solving linear systems, Markov Chains and continuous-time averaging networks. We develop an efficient linearly convergent distributed algorithm for computing effective resistances and demonstrate its performance through numerical studies. We also apply our algorithm to the consensus problem where the aim is to compute the average of node values in a distributed manner. We show that the distributed algorithm we developed for effective resistances can be used to accelerate the convergence of the classical consensus iterations considerably by a factor depending on the network structure. 
    more » « less