skip to main content

Title: Simplicial closure and higher-order link prediction
Networks provide a powerful formalism for modeling complex sys- tems, by representing the underlying set of pairwise interactions. But much of the structure within these systems involves interac- tions that take place among more than two nodes at once — for example, communication within a group rather than person-to- person, collaboration among a team rather than a pair of co-authors, or biological interaction between a set of molecules rather than just two. We refer to these type of simultaneous interactions on sets of more than two nodes as higher-order interactions; they are ubiquitous, but the empirical study of them has lacked a general framework for evaluating higher-order models. Here we introduce such a framework, based on link prediction, a fundamental prob- lem in network analysis. The traditional link prediction problem seeks to predict the appearance of new links in a network, and here we adapt it to predict which (larger) sets of elements will have fu- ture interactions. We study the temporal evolution of 19 datasets from a variety of domains, and use our higher-order formulation of link prediction to assess the types of structural features that are most predictive of new multi-way interactions. Among our results, we find that more » different domains vary considerably in their distri- bution of higher-order structural parameters, and that the higher- order link prediction problem exhibits some fundamental differ- ences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the ap- pearance of new interactions. « less
Authors:
; ; ; ;
Award ID(s):
1740822
Publication Date:
NSF-PAR ID:
10064655
Journal Name:
arXiv.org
ISSN:
2331-8422
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning representations of sets of nodes in a graph is crucial for applications ranging from node-role discovery to link prediction and molecule classification. Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, expressive power of GNNs is limited by the 1-Weisfeiler-Lehman (WL) test and thus GNNs generate identical representations for graph substructures that may in fact be very different. More powerful GNNs, proposed recently by mimicking higher-order-WL tests, only focus on representing entire graphs and they are computationally inefficient as they cannot utilize sparsity of the underlying graph. Here we propose and mathematically analyze a general class of structure related features, termed Distance Encoding (DE). DE assists GNNs in representing any set of nodes, while providing strictly more expressive power than the 1-WL test. DE captures the distance between the node set whose representation is to be learned and each node in the graph. To capture the distance DE can apply various graph-distance measures such as shortest path distance or generalized PageRank scores. We propose two ways for GNNs to use DEs (1) as extra node features, and (2) as controllers of message aggregation in GNNs. Both approaches can utilize the sparse structure of the underlyingmore »graph, which leads to computational efficiency and scalability. We also prove that DE can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. We evaluate DE on three tasks over six real networks: structural role prediction, link prediction, and triangle prediction. Results show that our models outperform GNNs without DE by up-to 15% in accuracy and AUROC. Furthermore, our models also significantly outperform other state-of-the-art methods especially designed for the above tasks.« less
  2. Simplicial neural networks (SNNs) have recently emerged as a new direction in graph learning which expands the idea of convolutional architectures from node space to simplicial complexes on graphs. Instead of predominantly assessing pairwise relations among nodes as in the current practice, simplicial complexes allow us to describe higher-order interactions and multi-node graph structures. By building upon connection between the convolution operation and the new block Hodge-Laplacian, we propose the first SNN for link prediction. Our new Block Simplicial Complex Neural Networks (BScNets) model generalizes existing graph convolutional network (GCN) frameworks by systematically incorporating salient interactions among multiple higher-order graph structures of different dimensions. We discuss theoretical foundations behind BScNets and illustrate its utility for link prediction on eight real-world and synthetic datasets. Our experiments indicate that BScNets outperforms the state-of-the-art models by a significant margin while maintaining low computation costs. Finally, we show utility of BScNets as a new promising alternative for tracking spread of infectious diseases such as COVID-19 and measuring the effectiveness of the healthcare risk mitigation strategies.
  3. Abstract

    Noncoding RNAs (ncRNAs) have recently attracted considerable attention due to their key roles in biology. The ncRNA–proteins interaction (NPI) is often explored to reveal some biological activities that ncRNA may affect, such as biological traits, diseases, etc. Traditional experimental methods can accomplish this work but are often labor-intensive and expensive. Machine learning and deep learning methods have achieved great success by exploiting sufficient sequence or structure information. Graph Neural Network (GNN)-based methods consider the topology in ncRNA–protein graphs and perform well on tasks like NPI prediction. Based on GNN, some pairwise constraint methods have been developed to apply on homogeneous networks, but not used for NPI prediction on heterogeneous networks. In this paper, we construct a pairwise constrained NPI predictor based on dual Graph Convolutional Network (GCN) called NPI-DGCN. To our knowledge, our method is the first to train a heterogeneous graph-based model using a pairwise learning strategy. Instead of binary classification, we use a rank layer to calculate the score of an ncRNA–protein pair. Moreover, our model is the first to predict NPIs on the ncRNA–protein bipartite graph rather than the homogeneous graph. We transform the original ncRNA–protein bipartite graph into two homogenous graphs on which to exploremore »second-order implicit relationships. At the same time, we model direct interactions between two homogenous graphs to explore explicit relationships. Experimental results on the four standard datasets indicate that our method achieves competitive performance with other state-of-the-art methods. And the model is available at https://github.com/zhuoninnin1992/NPIPredict

    « less
  4. Networks are a natural representation of complex systems across the sciences, and higher-order dependencies are central to the understanding and modeling of these systems. However, in many practical applications such as online social networks, networks are massive, dynamic, and naturally streaming, where pairwise interactions among vertices become available one at a time in some arbitrary order. The massive size and streaming nature of these networks allow only partial observation, since it is infeasible to analyze the entire network. Under such scenarios, it is challenging to study the higher-order structural and connectivity patterns of streaming networks. In this work, we consider the fundamental problem of estimating the higher-order dependencies using adaptive sampling. We propose a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks. Our algorithms exploit adaptive techniques to identify edges that are highly informative for efficiently estimating the higher-order structure of streaming networks from small sample data. We also introduce a novel James-Stein shrinkage estimator to reduce the estimation error. Our approach is fully analytic, computationally efficient, and can be incrementally updated in a streaming setting. Numerical experiments on large networks show that our approach is superior to baseline methods.
  5. Network alignment (NA) is a fundamental problem in many application domains – from social networks, through biology and communications, to neuroscience. The main objective is to identify common nodes and most similar connections across multiple networks (resp. graphs). Many of the existing efforts focus on efficient anchor node linkage by leveraging various features and optimizing network mapping functions with the pairwise similarity between anchor nodes. Despite the recent advances, there still exist two kinds of challenges: (1) entangled node embeddings, arising from the contradictory goals of NA: embedding proximal nodes in a closed form for representation in a single network vs. discriminating among them when mapping the nodes across networks; and (2) lack of interpretability about the node matching and alignment, essential for understanding prediction tasks. We propose dNAME (disentangled Network Alignment with Matching Explainability) – a novel solution for NA in heterogeneous networks settings, based on a matching technique that embeds nodes in a disentangled and faithful manner. The NA task is cast as an adversarial optimization problem which learns a proximity-preserving model locally around the anchor nodes, while still being discriminative. We also introduce a method to explain our semi-supervised model with the theory of robust statistics, bymore »tracing the importance of each anchor node and its explanations on the NA performance. This is extensible to many other NA methods, as it provides model interpretability. Experiments conducted on several public datasets show that dNAME outperforms the state-of-the-art methods in terms of both network alignment precision and node matching ranking.« less