skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Billion-scale Detection of Isomorphic Nodes
This paper presents an algorithm for detecting attributed high-degree node isomorphism. High-degree isomorphic nodes seldom happen by chance and often represent duplicated entities or data processing errors. By definition, isomorphic nodes are topologically indistinguishable and can be problematic in graph ML tasks. The algorithm employs a parallel, “degree-bounded” approach that fingerprints each node’s local properties through a hash, which constrains the search to nodes within hash-defined buckets, thus minimising the number of comparisons. This method scales on graphs with billions of nodes and edges. Finally, we provide isomorphic node oddities identified in real-world data.  more » « less
Award ID(s):
2109988
PAR ID:
10477184
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-1199-0
Page Range / eLocation ID:
230 to 233
Format(s):
Medium: X
Location:
St. Petersburg, FL, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the ad-hoc networks consisting of n wireless nodes that are located on the plane. Any two given nodes are called neighbors if they are located within a certain distance (communication range) from one another. A given node can be directly connected to any one of its neighbors, and picks its connections according to a unique topology control algorithm that is available at every node. Given that each node knows only the indices (unique identification numbers) of its one and two-hop neighbors, we identify an algorithm that preserves connectivity and can operate without the need of any synchronization among nodes. Moreover, the algorithm results in a sparse graph with at most 5n edges and a maximum node degree of 10. Existing algorithms with the same promises further require neighbor distance and/or direction information at each node. We also evaluate the performance of our algorithm for random networks. In this case, our algorithm provides an asymptotically connected network with n(1+o(1)) edges with a degree less than or equal to 6 for 1-o(1) fraction of the nodes. We also introduce another asynchronous connectivity-preserving algorithm that can provide an upper bound as well as a lower bound on node degrees. 
    more » « less
  2. This article introduces a novel mutual informationbased measure to assess the glass ceiling effect in preferential attachment networks, which advances the analysis of inequalities in attributed networks. Using Shannon entropy and generalizing to Rényi entropy, our measure evaluates the conditional probability distributions of node attributes given the node degrees of adjacent nodes, which offers a more nuanced understanding of inequality compared to traditional methods that emphasize node degree distributions and degree assortativity alone. To evaluate the efficacy of the proposed measure, we evaluate it using an analytical structural inequality model as well as historical publication data. Results show that our mutual information measure aligns well with both the theoretical model and empirical data, underscoring its reliability as a robust approach for capturing inequalities in attributed networks. Moreover, we introduce a novel stochastic optimization algorithm that utilizes a parameterized conditional logit model for edge addition. Our algorithm is shown to outperform the baseline uniform distribution based approach in mitigating the glass ceiling effect. By strategically recommending links based on this algorithm, we can effectively hinder the glass ceiling effect within networks. 
    more » « less
  3. Disaggregated memory architecture decouples computing and memory resources into separate pools connected via high-speed interconnect technologies, offering substantial advantages in scalability and resource utilization. However, this architecture also poses unique challenges in designing effective index structures and concurrency protocols due to increased remote memory access overhead and its shared-everything nature. In this paper, we present DART, a lock-free two-layer hashed Adaptive Radix Tree (ART) designed to minimize remote memory access while ensuring high concurrency and crash consistency in the disaggregated memory architecture. DART incorporates a hash-based Express Skip Table at its upper layer, which reduces the round trips of remote memory access during index traversal. In the base layer, DART employs an Adaptive Hashed Layout within ART nodes, confining remote memory accesses during in-node searches to small hash buckets. By further leveraging Decoupled Metadata Organization, DART achieves lock-free atomic updates, enabling high scalability and ensuring crash consistency. Our evaluation demonstrates that DART outperforms state-of-the-art counterparts by up to 5.8X in YCSB workloads. 
    more » « less
  4. Summary Network latent space models assume that each node is associated with an unobserved latent position in a Euclidean space, and such latent variables determine the probability of two nodes connecting with each other. In many applications, nodes in the network are often observed along with high-dimensional node variables, and these node variables provide important information for understanding the network structure. However, classical network latent space models have several limitations in incorporating node variables. In this paper, we propose a joint latent space model where we assume that the latent variables not only explain the network structure, but are also informative for the multivariate node variables. We develop a projected gradient descent algorithm that estimates the latent positions using a criterion incorporating both network structure and node variables. We establish theoretical properties of the estimators and provide insights into how incorporating high-dimensional node variables could improve the estimation accuracy of the latent positions. We demonstrate the improvement in latent variable estimation and the improvements in associated downstream tasks, such as missing value imputation for node variables, by simulation studies and an application to a Facebook data example. 
    more » « less
  5. Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications. Despite the successes of GCN deployment, GCN often exhibits performance disparity with respect to node de- grees, resulting in worse predictive accuracy for low-degree nodes. We formulate the problem of mitigating the degree-related per- formance disparity in GCN from the perspective of the Rawlsian difference principle, which is originated from the theory of distribu- tive justice. Mathematically, we aim to balance the utility between low-degree nodes and high-degree nodes while minimizing the task- specific loss. Specifically, we reveal the root cause of this degree- related unfairness by analyzing the gradients of weight matrices in GCN. Guided by the gradients of weight matrices, we further propose a pre-processing method RawlsGCN-Graph and an in- processing method RawlsGCN-Grad that achieves fair predictive accuracy in low-degree nodes without modification on the GCN architecture or introduction of additional parameters. Extensive experiments on real-world graphs demonstrate the effectiveness of our proposed RawlsGCN methods in significantly reducing degree- related bias while retaining comparable overall performance. 
    more » « less