skip to main content


Title: Personalized Degrees: Effects on Link Formation in Dynamic Networks from an Egocentric Perspective
Understanding mechanisms driving link formation in dynamic social networks is a long-standing problem that has implications to understanding social structure as well as link prediction and recommendation. Social networks exhibit a high degree of transitivity, which explains the successes of common neighbor-based methods for link prediction. In this paper, we examine mechanisms behind link formation from the perspective of an ego node. We introduce the notion of personalized degree for each neighbor node of the ego, which is the number of other neighbors a particular neighbor is connected to. From empirical analyses on four on-line social network datasets, we find that neighbors with higher personalized degree are more likely to lead to new link formations when they serve as common neighbors with other nodes, both in undirected and directed settings. This is complementary to the finding of Adamic and Adar that neighbor nodes with higher (global) degree are less likely to lead to new link formations. Furthermore, on directed networks, we find that personalized out-degree has a stronger effect on link formation than personalized in-degree, whereas global in-degree has a stronger effect than global out-degree. We validate our empirical findings through several link recommendation experiments and observe that incorporating both personalized and global degree into link recommendation greatly improves accuracy.  more » « less
Award ID(s):
1755824
NSF-PAR ID:
10097302
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Companion Proceedings of the World Wide Web Conference
Page Range / eLocation ID:
1039 - 1046
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hill, Alison L. (Ed.)
    The structure of contact networks affects the likelihood of disease spread at the population scale and the risk of infection at any given node. Though this has been well characterized for both theoretical and empirical networks for the spread of epidemics on completely susceptible networks, the long-term impact of network structure on risk of infection with an endemic pathogen, where nodes can be infected more than once, has been less well characterized. Here, we analyze detailed records of the transportation of cattle among farms in Turkey to characterize the global and local attributes of the directed—weighted shipments network between 2007-2012. We then study the correlations between network properties and the likelihood of infection with, or exposure to, foot-and-mouth disease (FMD) over the same time period using recorded outbreaks. The shipments network shows a complex combination of features (local and global) that have not been previously reported in other networks of shipments; i.e. small-worldness, scale-freeness, modular structure, among others. We find that nodes that were either infected or at high risk of infection with FMD (within one link from an infected farm) had disproportionately higher degree, were more central (eigenvector centrality and coreness), and were more likely to be net recipients of shipments compared to those that were always more than 2 links away from an infected farm. High in-degree (i.e. many shipments received) was the best univariate predictor of infection. Low in-coreness (i.e. peripheral nodes) was the best univariate predictor of nodes always more than 2 links away from an infected farm. These results are robust across the three different serotypes of FMD observed in Turkey and during periods of low-endemic prevalence and high-prevalence outbreaks. 
    more » « less
  2. Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks. 
    more » « less
  3. Link prediction is one of the fundamental problems in computational social science. A particularly common means to predict existence of unobserved links is via structural similarity metrics, such as the number of common neighbors; node pairs with higher similarity are thus deemed more likely to be linked. However, a number of applications of link prediction, such as predicting links in gang or terrorist networks, are adversarial, with another party incentivized to minimize its effectiveness by manipulating observed information about the network. We offer a comprehensive algorithmic investigation of the problem of attacking similarity-based link prediction through link deletion, focusing on two broad classes of such approaches, one which uses only local information about target links, and another which uses global network information. While we show several variations of the general problem to be NP-Hard for both local and global metrics, we exhibit a number of well-motivated special cases which are tractable. Additionally, we provide principled and empirically effective algorithms for the intractable cases, in some cases proving worst-case approximation guarantees. 
    more » « less
  4. In social networks, a node’s position is, in and of itself, a form of social capital. Better-positioned members not only benefit from (faster) access to diverse information, but innately have more potential influence on information spread. Structural biases often arise from network formation, and can lead to significant disparities in information access based on position. Further, processes such as link recommendation can exacerbate this inequality by relying on network structure to augment connectivity. In this paper, we argue that one can understand and quantify this social capital through the lens of information flow in the network. In contrast to prior work, we consider the setting where all nodes may be sources of distinct information, and a node’s (dis)advantage takes into account its ability to access all information available on the network, not just that from a single source. We introduce three new measures of advantage (broadcast, influence, and control), which are quantified in terms of position in the network using access signatures – vectors that represent a node’s ability to share information with each other node in the network. We then consider the problem of improving equity by making interventions to increase the access of the least-advantaged nodes. Since all nodes are already sources of information in our model, we argue that edge augmentation is most appropriate for mitigating bias in the network structure, and frame a budgeted intervention problem for maximizing broadcast (minimum pairwise access) over the network. Finally, we propose heuristic strategies for selecting edge augmentations and empirically evaluate their performance on a corpus of real-world social networks. We demonstrate that a small number of interventions can not only significantly increase the broadcast measure of access for the least-advantaged nodes (over 5 times more than random), but also simultaneously improve the minimum influence. Additional analysis shows that edge augmentations targeted at improving minimum pairwise access can also dramatically shrink the gap in advantage between nodes (over ) and reduce disparities between their access signatures. 
    more » « less
  5. The 2020 United States (US) presidential election was — and has continued to be — the focus of pervasive and persistent mis- and disinformation spreading through our media ecosystems, including social media. This event has driven the collection and analysis of large, directed social network datasets, but such datasets can resist intuitive understanding. In such large datasets, the overwhelming number of nodes and edges present in typical representations create visual artifacts, such as densely overlapping edges and tightly-packed formations of low-degree nodes, which obscure many features of more practical interest. We apply a method, coengagement transformations, to convert such networks of social data into tractable images. Intuitively, this approach allows for parameterized network visualizations that make shared audiences of engaged viewers salient to viewers. Using the interpretative capabilities of this method, we perform an extensive case study of the 2020 United States presidential election on Twitter, contributing an empirical analysis of coengagement. By creating and contrasting different networks at different parameter sets, we define and characterize several structures in this discourse network, including bridging accounts, satellite audiences, and followback communities. We discuss the importance and implications of these empirical network features in this context. In addition, we release open-source code for creating coengagement networks from Twitter and other structured interaction data. 
    more » « less