skip to main content

Title: "What Do Your Friends Think?": Efficient Polling Methods for Networks Using Friendship Paradox
This paper deals with randomized polling of a social network. In the case of forecasting the outcome of an election between two candidates A and B, classical intent polling asks randomly sampled individuals: who will you vote for? Expectation polling asks: who do you think will win? In this paper, we propose a novel neighborhood expectation polling (NEP) strategy that asks randomly sampled individuals: what is your estimate of the fraction of votes for A? Therefore, in NEP, sampled individuals will naturally look at their neighbors (defined by the underlying social network graph) when answering this question. Hence, the mean squared error (MSE) of NEP methods rely on selecting the optimal set of samples from the network. To this end, we propose three NEP algorithms for the following cases: (i) the social network graph is not known but, random walks (sequential exploration) can be performed on the graph (ii) the social network graph is unknown. For case (i) and (ii), two algorithms based on a graph theoretic consequence called friendship paradox are proposed. Theoretical results on the dependence of the MSE of the algorithms on the properties of the network are established. Numerical results on real and synthetic data sets are provided to illustrate the performance of the algorithms.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
Page Range / eLocation ID:
1 to 1
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Networks (or graphs) are used to model the dyadic relations between entities in complex systems. Analyzing the properties of the networks reveal important characteristics of the underlying system. However, in many disciplines, including social sciences, bioinformatics, and technological systems, multiple relations exist between entities. In such cases, a simple graph is not sufficient to model these multiple relations, and a multilayer network is a more appropriate model. In this paper, we explore community detection in multilayer networks. Specifically, we propose a novel network decoupling strategy for efficiently combining the communities in the different layers using the Boolean primitives AND, OR, and NOT. Our proposed method, network decoupling, is based on analyzing the communities in each network layer individually and then aggregating the analysis results. We (i) describe our network decoupling algorithms for finding communities, (ii) present how network decoupling can be used to express different types of communities in multilayer networks, and (iii) demonstrate the effectiveness of using network decoupling for detecting communities in real-world and synthetic data sets. Compared to other algorithms for detecting communities in multilayer networks, our proposed network decoupling method requires significantly lower computation time while producing results of high accuracy. Based on these results, we anticipate that our proposed network decoupling technique will enable a more detailed analysis of multilayer networks in an efficient manner. 
    more » « less
  2. Abstract

    Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complexaggregate graph queries(AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.

    more » « less
  3. null (Ed.)
    We study allocation of COVID-19 vaccines to individuals based on the structural properties of their underlying social contact network. Even optimistic estimates suggest that most countries will likely take 6 to 24 months to vaccinate their citizens. These time estimates and the emergence of new viral strains urge us to find quick and effective ways to allocate the vaccines and contain the pandemic. While current approaches use combinations of age-based and occupation-based prioritizations, our strategy marks a departure from such largely aggregate vaccine allocation strategies. We propose a novel agent-based modeling approach motivated by recent advances in (i) science of real-world networks that point to efficacy of certain vaccination strategies and (ii) digital technologies that improve our ability to estimate some of these structural properties. Using a realistic representation of a social contact network for the Commonwealth of Virginia, combined with accurate surveillance data on spatio-temporal cases and currently accepted models of within- and between-host disease dynamics, we study how a limited number of vaccine doses can be strategically distributed to individuals to reduce the overall burden of the pandemic. We show that allocation of vaccines based on individuals' degree (number of social contacts) and total social proximity time is signi ficantly more effective than the currently used age-based allocation strategy in terms of number of infections, hospitalizations and deaths. Our results suggest that in just two months, by March 31, 2021, compared to age-based allocation, the proposed degree-based strategy can result in reducing an additional 56{110k infections, 3.2{5.4k hospitalizations, and 700{900 deaths just in the Commonwealth of Virginia. Extrapolating these results for the entire US, this strategy can lead to 3{6 million fewer infections, 181{306k fewer hospitalizations, and 51{62k fewer deaths compared to age-based allocation. The overall strategy is robust even: (i) if the social contacts are not estimated correctly; (ii) if the vaccine efficacy is lower than expected or only a single dose is given; (iii) if there is a delay in vaccine production and deployment; and (iv) whether or not non-pharmaceutical interventions continue as vaccines are deployed. For reasons of implementability, we have used degree, which is a simple structural measure and can be easily estimated using several methods, including the digital technology available today. These results are signi ficant, especially for resource-poor countries, where vaccines are less available, have lower efficacy, and are more slowly distributed. 
    more » « less
  4. We consider SIS contagion processes over networks where, a classical assumption is that individuals' decisions to adopt a contagion are based on their immediate neighbors. However, recent literature shows that some attributes are more correlated between two-hop neighbors, a concept referred to as monophily. This motivates us to explore monophilic contagion, the case where a contagion (e.g. a product, disease) is adopted by considering two-hop neighbors instead of immediate neighbors (e.g. you ask your friend about the new iPhone and she recommends you the opinion of one of her friends). We show that the phenomenon called friendship paradox makes it easier for the monophilic contagion to spread widely. We also consider the case where the underlying network stochastically evolves in response to the state of the contagion (e.g. depending on the severity of a flu virus, people restrict their interactions with others to avoid getting infected) and show that the dynamics of such a process can be approximated by a differential equation whose trajectory satisfies an algebraic constraint restricting it to a manifold. Our results shed light on how graph theoretic consequences affect contagions and, provide simple deterministic models to approximate the collective dynamics of contagions over stochastic graph processes. 
    more » « less
  5. Healthcare acquired infections (HAIs) (e.g., Methicillin-resistant Staphylococcus aureus infection) have complex transmission pathways, spreading not just via direct person-to-person contacts, but also via contaminated surfaces. Prior work in mathematical epidemiology has led to a class of models – which we call load sharing models – that provide a discrete-time, stochastic formalization of HAI-spread on temporal contact networks. The focus of this paper is the source detection problem for the load sharing model. The source detection problem has been studied extensively in SEIR type models, but this prior work does not apply to load sharing models.We show that a natural formulation of the source detection problem for the load sharing model is computationally hard, even to approximate. We then present two alternate formulations that are much more tractable. The tractability of our problems depends crucially on the submodularity of the expected number of infections as a function of the source set. Prior techniques for showing submodularity, such as the "live graph" technique are not applicable for the load sharing model and our key technical contribution is to use a more sophisticated "coupling" technique to show the submodularity result. We propose algorithms for our two problem formulations by extending existing algorithmic results from submodular optimization and combining these with an expectation propagation heuristic for the load sharing model that leads to orders-of-magnitude speedup. We present experimental results on temporal contact networks based on fine-grained EMR data from three different hospitals. Our results on synthetic outbreaks on these networks show that our algorithms outperform baselines by up to 5.97 times. Furthermore, case studies based on hospital outbreaks of Clostridioides difficile infection show that our algorithms identify clinically meaningful sources. 
    more » « less