skip to main content


Title: Localizing the Information Source in a Network
Information and content can spread in social networks analogous to how diseases spread between organisms. Identifying the source of an outbreak is challenging when the infection times are unknown. We consider the problem of detecting the source of a rumor that spread randomly in a network according to a simple diffusion model, the susceptible-infected (SI) exponential time model. The infection times are unknown. Only the set of nodes that propagated the rumor before a certain time is known. Since evaluating the likelihood of spreads is computationally prohibitive, we propose a simple and efficient procedure to approximate the likelihood and select a candidate rumor source. We empirically demonstrate our method out-performs the Jordan center procedure in various random graphs and a real-world network.  more » « less
Award ID(s):
1742847
NSF-PAR ID:
10125022
Author(s) / Creator(s):
;
Date Published:
Journal Name:
TrueFact 2019 : KDD 2019 Workshop on Truth Discovery and Fact Checking: Theory and Practice
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Cosmological parameters encoding our understanding of the expansion history of the universe can be constrained by the accurate estimation of time delays arising in gravitationally lensed systems. We propose TD-CARMA, a Bayesian method to estimate cosmological time delays by modeling observed and irregularly sampled light curves as realizations of a continuous auto-regressive moving average (CARMA) process. Our model accounts for heteroskedastic measurement errors and microlensing, an additional source of independent extrinsic long-term variability in the source brightness. The semiseparable structure of the CARMA covariance matrix allows for fast and scalable likelihood computation using Gaussian process modeling. We obtain a sample from the joint posterior distribution of the model parameters using a nested sampling approach. This allows for “painless” Bayesian computation, dealing with the expected multimodality of the posterior distribution in a straightforward manner and not requiring the specification of starting values or an initial guess for the time delay, unlike existing methods. In addition, the proposed sampling procedure automatically evaluates the Bayesian evidence, allowing us to perform principled Bayesian model selection. TD-CARMA is parsimonious, and typically includes no more than a dozen unknown parameters. We apply TD-CARMA to six doubly lensed quasars HS2209+1914, SDSS J1001+5027, SDSS J1206+4332, SDSS J1515+1511, SDSS J1455+1447, and SDSS J1349+1227, estimating their time delays as −21.96 ± 1.448, 120.93 ± 1.015, 111.51 ± 1.452, 210.80 ± 2.18, 45.36 ± 1.93, and 432.05 ± 1.950, respectively. These estimates are consistent with those derived in the relevant literature, but are typically two to four times more precise.

     
    more » « less
  2. Hill, Alison L. (Ed.)
    The structure of contact networks affects the likelihood of disease spread at the population scale and the risk of infection at any given node. Though this has been well characterized for both theoretical and empirical networks for the spread of epidemics on completely susceptible networks, the long-term impact of network structure on risk of infection with an endemic pathogen, where nodes can be infected more than once, has been less well characterized. Here, we analyze detailed records of the transportation of cattle among farms in Turkey to characterize the global and local attributes of the directed—weighted shipments network between 2007-2012. We then study the correlations between network properties and the likelihood of infection with, or exposure to, foot-and-mouth disease (FMD) over the same time period using recorded outbreaks. The shipments network shows a complex combination of features (local and global) that have not been previously reported in other networks of shipments; i.e. small-worldness, scale-freeness, modular structure, among others. We find that nodes that were either infected or at high risk of infection with FMD (within one link from an infected farm) had disproportionately higher degree, were more central (eigenvector centrality and coreness), and were more likely to be net recipients of shipments compared to those that were always more than 2 links away from an infected farm. High in-degree (i.e. many shipments received) was the best univariate predictor of infection. Low in-coreness (i.e. peripheral nodes) was the best univariate predictor of nodes always more than 2 links away from an infected farm. These results are robust across the three different serotypes of FMD observed in Turkey and during periods of low-endemic prevalence and high-prevalence outbreaks. 
    more » « less
  3. null (Ed.)
    Online social networks provide a convenient platform for the spread of rumors, which could lead to serious aftermaths such as economic losses and public panic. The classical rumor blocking problem aims to launch a set of nodes as a positive cascade to compete with misinformation in order to limit the spread of rumors. However, most of the related researches were based on a one-dimensional diffusion model. In reality, there is more than one feature associated with an object. A user’s impression on this object is determined not just by one feature but by her overall evaluation of all features associated with it. Thus, the influence spread of this object can be decomposed into the spread of multiple features. Based on that, we design a multi-feature diffusion model (MF-model) in this paper and formulate a multi-feature rumor blocking (MFRB) problem on a multi-layer network structure according to this model. To solve the MFRB problem, we design a creative sampling method called Multi-Sampling, which can be applied to this multi-layer network structure. Then, we propose a Revised-IMM algorithm and obtain a satisfactory approximate solution to MFRB. Finally, we evaluate our proposed algorithm by conducting experiments on real datasets, which shows the effectiveness of our Revised- IMM and its advantage to their baseline algorithms. 
    more » « less
  4. Healthcare acquired infections (HAIs) (e.g., Methicillin-resistant Staphylococcus aureus infection) have complex transmission pathways, spreading not just via direct person-to-person contacts, but also via contaminated surfaces. Prior work in mathematical epidemiology has led to a class of models – which we call load sharing models – that provide a discrete-time, stochastic formalization of HAI-spread on temporal contact networks. The focus of this paper is the source detection problem for the load sharing model. The source detection problem has been studied extensively in SEIR type models, but this prior work does not apply to load sharing models.We show that a natural formulation of the source detection problem for the load sharing model is computationally hard, even to approximate. We then present two alternate formulations that are much more tractable. The tractability of our problems depends crucially on the submodularity of the expected number of infections as a function of the source set. Prior techniques for showing submodularity, such as the "live graph" technique are not applicable for the load sharing model and our key technical contribution is to use a more sophisticated "coupling" technique to show the submodularity result. We propose algorithms for our two problem formulations by extending existing algorithmic results from submodular optimization and combining these with an expectation propagation heuristic for the load sharing model that leads to orders-of-magnitude speedup. We present experimental results on temporal contact networks based on fine-grained EMR data from three different hospitals. Our results on synthetic outbreaks on these networks show that our algorithms outperform baselines by up to 5.97 times. Furthermore, case studies based on hospital outbreaks of Clostridioides difficile infection show that our algorithms identify clinically meaningful sources. 
    more » « less
  5. Interval‐censored failure time data commonly arise in epidemiological and biomedical studies where the occurrence of an event or a disease is determined via periodic examinations. Subject to interval‐censoring, available information on the failure time can be quite limited. Cost‐effective sampling designs are desirable to enhance the study power, especially when the disease rate is low and the covariates are expensive to obtain. In this work, we formulate the case‐cohort design with multiple interval‐censored disease outcomes and also generalize it to nonrare diseases where only a portion of diseased subjects are sampled. We develop a marginal sieve weighted likelihood approach, which assumes that the failure times marginally follow the proportional hazards model. We consider two types of weights to account for the sampling bias, and adopt a sieve method with Bernstein polynomials to handle the unknown baseline functions. We employ a weighted bootstrap procedure to obtain a variance estimate that is robust to the dependence structure between failure times. The proposed method is examined via simulation studies and illustrated with a dataset on incident diabetes and hypertension from the Atherosclerosis Risk in Communities study.

     
    more » « less