skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adversarial Robustness of Similarity-Based Link Prediction
Link prediction is one of the fundamental problems in social network analysis. A common set of techniques for link prediction rely on similarity metrics which use the topology of the observed subnetwork to quantify the likelihood of unobserved links. Recently, similarity metrics for link prediction have been shown to be vulnerable to attacks whereby observations about the network are adversarially modified to hide target links. We propose a novel approach for increasing robustness of similarity-based link prediction by endowing the analyst with a restricted set of reliable queries which accurately measure the existence of queried links. The analyst aims to robustly predict a collection of possible links by optimally allocating the reliable queries. We formalize the analyst problem as a Bayesian Stackelberg game in which they first choose the reliable queries, followed by an adversary who deletes a subset of links among the remaining (unreliable) queries by the analyst. The analyst in our model is uncertain about the particular target link the adversary attempts to hide, whereas the adversary has full information about the analyst and the network. Focusing on similarity metrics using only local information, we show that the problem is NP-Hard for both players, and devise two principled and efficient approaches for solving it approximately. Extensive experiments with real and synthetic networks demonstrate the effectiveness of our approach.  more » « less
Award ID(s):
1905558
PAR ID:
10130995
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Data Mining
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Link prediction is one of the fundamental problems in computational social science. A particularly common means to predict existence of unobserved links is via structural similarity metrics, such as the number of common neighbors; node pairs with higher similarity are thus deemed more likely to be linked. However, a number of applications of link prediction, such as predicting links in gang or terrorist networks, are adversarial, with another party incentivized to minimize its effectiveness by manipulating observed information about the network. We offer a comprehensive algorithmic investigation of the problem of attacking similarity-based link prediction through link deletion, focusing on two broad classes of such approaches, one which uses only local information about target links, and another which uses global network information. While we show several variations of the general problem to be NP-Hard for both local and global metrics, we exhibit a number of well-motivated special cases which are tractable. Additionally, we provide principled and empirically effective algorithms for the intractable cases, in some cases proving worst-case approximation guarantees. 
    more » « less
  2. Social networks arise as a result of complex interactions among people, and homophily plays an important role in this process. If we view homophily as a dominant force in network formation and associate each node with a collection of features, this process gives rise to spatial networks, with likelihood of an edge an increasing function of feature similarity among its incident nodes. A link prediction problem in such spatial networks then amounts to deter- mining whether the pair of nodes are sufficiently close according to this edge likelihood function. We undertake the first algorithmic study of the adversarial side of this problem in which the adversary manipulates features of a subset of nodes on the network to pre- vent predicting target edges. We show that this problem is NP-hard, even if the edge likelihood function is convex. On the other hand, if this function is convex, we show that the problem can be solved with convex programming when the set of nodes that the adversary needs to manipulate is fixed. Furthermore, if the edge likelihood function is linear, we present approximation algorithms for the case when the features are binary, and we wish to hide only a single edge, and for the case when the features are real-valued but we need to hide an arbitrary collection of edges. 
    more » « less
  3. Existing causal models for link prediction assume an underlying set of inherent node factors—an innate characteristic defined at the node’s birth—that governs the causal evolution of links in the graph. In some causal tasks, however, link formation ispath-dependent: the outcome of link interventions depends on existing links. Unfortunately, these existing causal methods are not designed for path-dependent link formation, as the cascading functional dependencies between links (arising frompath dependence) are either unidentifiable or require an impractical number of control variables. To overcome this, we develop the first causal model capable of dealing with path dependencies in link prediction. In this work, we introduce the concept of causal lifting, an invariance in causal models of independent interest that, on graphs, allows the identification of causal link prediction queries using limited interventional data. Further, we show how structural pairwise embeddings exhibit lower bias and correctly represent the task’s causal structure, as opposed to existing node embeddings, e.g. graph neural network node embeddings and matrix factorization. Finally, we validate our theoretical findings on three scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations. 
    more » « less
  4. Database access logs are the starting point for many forms of database administration, from database performance tuning, to security auditing, to benchmark design, and many more. Unfortunately, query logs are also large and unwieldy, and it can be difficult for an analyst to extract broad patterns from the set of queries found therein. Clustering is a natural first step towards understanding the massive query logs. However, many clustering methods rely on the notion of pairwise similarity, which is challenging to compute for SQL queries, especially when the underlying data and database schema is unavailable. We investigate the problem of computing similarity between queries, relying only on the query structure. We conduct a rigorous evaluation of three query similarity heuristics proposed in the literature applied to query clustering on multiple query log datasets, representing different types of query workloads. To improve the accuracy of the three heuristics, we propose a generic feature engineering strategy, using classical query rewrites to standardize query structure. The proposed strategy results in a significant improvement in the performance of all three similarity heuristics. 
    more » « less
  5. Networks provide a powerful formalism for modeling complex sys- tems, by representing the underlying set of pairwise interactions. But much of the structure within these systems involves interac- tions that take place among more than two nodes at once — for example, communication within a group rather than person-to- person, collaboration among a team rather than a pair of co-authors, or biological interaction between a set of molecules rather than just two. We refer to these type of simultaneous interactions on sets of more than two nodes as higher-order interactions; they are ubiquitous, but the empirical study of them has lacked a general framework for evaluating higher-order models. Here we introduce such a framework, based on link prediction, a fundamental prob- lem in network analysis. The traditional link prediction problem seeks to predict the appearance of new links in a network, and here we adapt it to predict which (larger) sets of elements will have fu- ture interactions. We study the temporal evolution of 19 datasets from a variety of domains, and use our higher-order formulation of link prediction to assess the types of structural features that are most predictive of new multi-way interactions. Among our results, we find that different domains vary considerably in their distri- bution of higher-order structural parameters, and that the higher- order link prediction problem exhibits some fundamental differ- ences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the ap- pearance of new interactions. 
    more » « less