skip to main content


Title: Attacking Similarity-Based Link Prediction in Social Networks
Link prediction is one of the fundamental problems in computational social science. A particularly common means to predict existence of unobserved links is via structural similarity metrics, such as the number of common neighbors; node pairs with higher similarity are thus deemed more likely to be linked. However, a number of applications of link prediction, such as predicting links in gang or terrorist networks, are adversarial, with another party incentivized to minimize its effectiveness by manipulating observed information about the network. We offer a comprehensive algorithmic investigation of the problem of attacking similarity-based link prediction through link deletion, focusing on two broad classes of such approaches, one which uses only local information about target links, and another which uses global network information. While we show several variations of the general problem to be NP-Hard for both local and global metrics, we exhibit a number of well-motivated special cases which are tractable. Additionally, we provide principled and empirically effective algorithms for the intractable cases, in some cases proving worst-case approximation guarantees.  more » « less
Award ID(s):
1905558
PAR ID:
10131040
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Autonomous Agents and Multiagent Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Link prediction is one of the fundamental problems in social network analysis. A common set of techniques for link prediction rely on similarity metrics which use the topology of the observed subnetwork to quantify the likelihood of unobserved links. Recently, similarity metrics for link prediction have been shown to be vulnerable to attacks whereby observations about the network are adversarially modified to hide target links. We propose a novel approach for increasing robustness of similarity-based link prediction by endowing the analyst with a restricted set of reliable queries which accurately measure the existence of queried links. The analyst aims to robustly predict a collection of possible links by optimally allocating the reliable queries. We formalize the analyst problem as a Bayesian Stackelberg game in which they first choose the reliable queries, followed by an adversary who deletes a subset of links among the remaining (unreliable) queries by the analyst. The analyst in our model is uncertain about the particular target link the adversary attempts to hide, whereas the adversary has full information about the analyst and the network. Focusing on similarity metrics using only local information, we show that the problem is NP-Hard for both players, and devise two principled and efficient approaches for solving it approximately. Extensive experiments with real and synthetic networks demonstrate the effectiveness of our approach. 
    more » « less
  2. Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. With the prevalence of interaction on social media, data compiled from these networks are perfect for analyzing social trends. One such trend that this paper aims to address is political homophily. Evidence of political homophily is well researched and indicates that people have a strong tendency to interact with others with similar political ideologies. Additionally, as links naturally form in a social network, either through recommendations or indirect interaction, new links are very likely to reinforce communities. This serves to make social media more insulated and ultimately more polarizing. We aim to address this problem by providing link recommendations that will reduce network homophily. We propose several variants of common neighbor-based link prediction algorithms that aim to recommend links to users who are similar but also would decrease homophily. We demonstrate that acceptance of these recommendations can indeed reduce the homophily of the network, whereas acceptance of link recommendations from a standard common neighbors algorithm does not. 
    more » « less
  4. This paper introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings. 
    more » « less
  5. This paper studies the problem of robustifying an interdependent network by rewiring a small number of links in realtime during a cascading attack. Interdependent networks have been widely used to model interconnected complex systems such as a critical infrastructure network including both the power grid and the Internet. Realtime robustification of interdependent networks, therefore, has significant practical importance. This paper formulates the problem using the Markov decision process (MDP) framework. We first show the problem is NP-hard and then develop an effective and efficient greedy algorithm, named R EAL W IRE , to robustify the network in realtime. R EAL W IRE scores each link (and each node) based on the expected number of links failures resulted from the failure of the link (or the node), and rewires the links greedily according to the scores. Extensive experimental results show that R EAL W IRE outperforms other algorithms on multiple trobustness metrics. 
    more » « less