skip to main content


Title: Adversarial Robustness of Similarity-Based Link Prediction
Link prediction is one of the fundamental problems in social network analysis. A common set of techniques for link prediction rely on similarity metrics which use the topology of the observed subnetwork to quantify the likelihood of unobserved links. Recently, similarity metrics for link prediction have been shown to be vulnerable to attacks whereby observations about the network are adversarially modified to hide target links. We propose a novel approach for increasing robustness of similarity-based link prediction by endowing the analyst with a restricted set of reliable queries which accurately measure the existence of queried links. The analyst aims to robustly predict a collection of possible links by optimally allocating the reliable queries. We formalize the analyst problem as a Bayesian Stackelberg game in which they first choose the reliable queries, followed by an adversary who deletes a subset of links among the remaining (unreliable) queries by the analyst. The analyst in our model is uncertain about the particular target link the adversary attempts to hide, whereas the adversary has full information about the analyst and the network. Focusing on similarity metrics using only local information, we show that the problem is NP-Hard for both players, and devise two principled and efficient approaches for solving it approximately. Extensive experiments with real and synthetic networks demonstrate the effectiveness of our approach.  more » « less
Award ID(s):
1905558
NSF-PAR ID:
10130995
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Data Mining
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Link prediction is one of the fundamental problems in computational social science. A particularly common means to predict existence of unobserved links is via structural similarity metrics, such as the number of common neighbors; node pairs with higher similarity are thus deemed more likely to be linked. However, a number of applications of link prediction, such as predicting links in gang or terrorist networks, are adversarial, with another party incentivized to minimize its effectiveness by manipulating observed information about the network. We offer a comprehensive algorithmic investigation of the problem of attacking similarity-based link prediction through link deletion, focusing on two broad classes of such approaches, one which uses only local information about target links, and another which uses global network information. While we show several variations of the general problem to be NP-Hard for both local and global metrics, we exhibit a number of well-motivated special cases which are tractable. Additionally, we provide principled and empirically effective algorithms for the intractable cases, in some cases proving worst-case approximation guarantees. 
    more » « less
  2. Social networks arise as a result of complex interactions among people, and homophily plays an important role in this process. If we view homophily as a dominant force in network formation and associate each node with a collection of features, this process gives rise to spatial networks, with likelihood of an edge an increasing function of feature similarity among its incident nodes. A link prediction problem in such spatial networks then amounts to deter- mining whether the pair of nodes are sufficiently close according to this edge likelihood function. We undertake the first algorithmic study of the adversarial side of this problem in which the adversary manipulates features of a subset of nodes on the network to pre- vent predicting target edges. We show that this problem is NP-hard, even if the edge likelihood function is convex. On the other hand, if this function is convex, we show that the problem can be solved with convex programming when the set of nodes that the adversary needs to manipulate is fixed. Furthermore, if the edge likelihood function is linear, we present approximation algorithms for the case when the features are binary, and we wish to hide only a single edge, and for the case when the features are real-valued but we need to hide an arbitrary collection of edges. 
    more » « less
  3. Software traceability establishes a network of connections between diverse artifacts such as requirements, design, and code. However, given the cost and effort of creating and maintaining trace links manually, researchers have proposed automated approaches using information retrieval techniques. Current approaches focus almost entirely upon generating links between pairs of artifacts and have not leveraged the broader network of interconnected artifacts. In this paper we investigate the use of intermediate artifacts to enhance the accuracy of the generated trace links - focusing on paths consisting of source, target, and intermediate artifacts. We propose and evaluate combinations of techniques for computing semantic similarity, scaling scores across multiple paths, and aggregating results from multiple paths. We report results from five projects, including one large industrial project. We find that leveraging intermediate artifacts improves the accuracy of end-to-end trace retrieval across all datasets and accuracy metrics. After further analysis, we discover that leveraging intermediate artifacts is only helpful when a project's artifacts share a common vocabulary, which tends to occur in refinement and decomposition hierarchies of artifacts. Given our hybrid approach that integrates both direct and transitive links, we observed little to no loss of accuracy when intermediate artifacts lacked a shared vocabulary with source or target artifacts. 
    more » « less
  4. Networks provide a powerful formalism for modeling complex sys- tems, by representing the underlying set of pairwise interactions. But much of the structure within these systems involves interac- tions that take place among more than two nodes at once — for example, communication within a group rather than person-to- person, collaboration among a team rather than a pair of co-authors, or biological interaction between a set of molecules rather than just two. We refer to these type of simultaneous interactions on sets of more than two nodes as higher-order interactions; they are ubiquitous, but the empirical study of them has lacked a general framework for evaluating higher-order models. Here we introduce such a framework, based on link prediction, a fundamental prob- lem in network analysis. The traditional link prediction problem seeks to predict the appearance of new links in a network, and here we adapt it to predict which (larger) sets of elements will have fu- ture interactions. We study the temporal evolution of 19 datasets from a variety of domains, and use our higher-order formulation of link prediction to assess the types of structural features that are most predictive of new multi-way interactions. Among our results, we find that different domains vary considerably in their distri- bution of higher-order structural parameters, and that the higher- order link prediction problem exhibits some fundamental differ- ences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the ap- pearance of new interactions. 
    more » « less
  5. One of the challenges in social media research is that, often times, researchers or third parties could not obtain the massive of data collected by a limited number of “big brothers” (e.g., Facebook and Google). In this paper, we shed light on leveraging social network topological properties and local information to effectively conduct search in Online Social Networks (OSN). The problem we focus on is to discover the reachability of a group of target people in an OSN, particularly from the perspective of a third-party analyst who does not have full access to the OSN. We developed effective and efficient detection techniques which demand only a small number of queries to discover people's connections (e.g. friendship) in the OSN. After conducting experiments on real-world data sets, we found that our proposed techniques perform as well as the centralized detection algorithm, which assumes the availability of the global information in the OSN. 
    more » « less