skip to main content


Title: Online Sampling of Temporal Networks
Temporal networks representing a stream of timestamped edges are seemingly ubiquitous in the real world. However, the massive size and continuous nature of these networks make them fundamentally challenging to analyze and leverage for descriptive and predictive modeling tasks. In this work, we propose a general framework for temporal network sampling with unbiased estimation. We develop online, single-pass sampling algorithms, and unbiased estimators for temporal network sampling. The proposed algorithms enable fast, accurate, and memory-efficient statistical estimation of temporal network patterns and properties. In addition, we propose a temporally decaying sampling algorithm with unbiased estimators for studying networks that evolve in continuous time, where the strength of links is a function of time, and the motif patterns are temporally weighted. In contrast to the prior notion of a △ t -temporal motif, the proposed formulation and algorithms for counting temporally weighted motifs are useful for forecasting tasks in networks such as predicting future links, or a future time-series variable of nodes and links. Finally, extensive experiments on a variety of temporal networks from different domains demonstrate the effectiveness of the proposed algorithms. A detailed ablation study is provided to understand the impact of the various components of the proposed framework.  more » « less
Award ID(s):
1848596
NSF-PAR ID:
10388298
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Volume:
15
Issue:
4
ISSN:
1556-4681
Page Range / eLocation ID:
1 to 27
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Networks are a natural representation of complex systems across the sciences, and higher-order dependencies are central to the understanding and modeling of these systems. However, in many practical applications such as online social networks, networks are massive, dynamic, and naturally streaming, where pairwise interactions among vertices become available one at a time in some arbitrary order. The massive size and streaming nature of these networks allow only partial observation, since it is infeasible to analyze the entire network. Under such scenarios, it is challenging to study the higher-order structural and connectivity patterns of streaming networks. In this work, we consider the fundamental problem of estimating the higher-order dependencies using adaptive sampling. We propose a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks. Our algorithms exploit adaptive techniques to identify edges that are highly informative for efficiently estimating the higher-order structure of streaming networks from small sample data. We also introduce a novel James-Stein shrinkage estimator to reduce the estimation error. Our approach is fully analytic, computationally efficient, and can be incrementally updated in a streaming setting. Numerical experiments on large networks show that our approach is superior to baseline methods. 
    more » « less
  2. Pattern counting in graphs is fundamental to several network sci- ence tasks, and there is an abundance of scalable methods for estimating counts of small patterns, often called motifs, in large graphs. However, modern graph datasets now contain richer structure, and incorporating temporal information in particular has become a key part of network analysis. Consequently, temporal motifs, which are generalizations of small subgraph patterns that incorporate temporal ordering on edges, are an emerging part of the network analysis toolbox. However, there are no algorithms for fast estimation of temporal motifs counts; moreover, we show that even counting simple temporal star motifs is NP-complete. Thus, there is a need for fast and approximate algorithms. Here, we present the first frequency estimation algorithms for counting temporal motifs. More specifically, we develop a sampling framework that sits as a layer on top of existing exact counting algorithms and enables fast and accurate memory-efficient estimates of temporal motif counts. Our results show that we can achieve one to two orders of magnitude speedups over existing algorithms with minimal and controllable loss in accuracy on a number of datasets. 
    more » « less
  3. In many real-world applications such as social network analysis and online advertising/marketing, one of the most important and popular problems is called influence maximization (IM), which finds a set of k seed users that maximize the expected number of influenced user nodes. In practice, however, maximizing the number of influenced nodes may be far from satisfactory for real applications such as opinion promotion and collective buying. In this paper, we explore the importance of stability and triangles in social networks, and formulate a novel problem in the influence spread scenario, named triangular stability maximization , over social networks, and generalize it to a general triangle influence maximization problem, which is proved to be NP-hard. We develop an efficient reverse influence sampling (RIS) based framework for the triangle IM with theoretical guarantees. To enable unbiased estimators, it demands probabilistic sampling of triangles, that is, sampling triangles according to their probabilities. We propose an edge-based triple sampling approach, which is exactly equivalent to probabilistic sampling and avoids costly triangle enumeration and materialization. We also design several pruning and reduction techniques, as well as a cost-model-guided heuristic algorithm. Extensive experiments and a case study over real-world graphs confirm the effectiveness of our proposed algorithms and the superiority of triangular stability maximization and triangle influence maximization. 
    more » « less
  4. Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms. 
    more » « less
  5. Applications in environmental monitoring, surveillance and patrolling typically require a network of mobile agents to collectively gain information regarding the state of a static or dynamical process evolving over a region. However, these networks of mobile agents also introduce various challenges, including intermittent observations of the dynamical process, loss of communication links due to mobility and packet drops, and the potential for malicious or faulty behavior by some of the agents. The main contribution of this paper is the development of resilient, fully-distributed, and provably correct state estimation algorithms that simultaneously account for each of the above considerations, and in turn, offer a general framework for reasoning about state estimation problems in dynamic, failure-prone and adversarial environments. Specifically, we develop a simple switched linear observer for dealing with the issue of time-varying measurement models, and resilient filtering techniques for dealing with worst-case adversarial behavior subject to time-varying communication patterns among the agents. Our approach considers both communication patterns that recur in a deterministic manner, and patterns that are induced by random packet drops. For each scenario, we identify conditions on the dynamical system, the patrols, the nominal communication network topology, and the failure models that guarantee applicability of our proposed techniques. Finally, we complement our theoretical results with detailed simulations that illustrate the efficacy of our algorithms in the presence of the technical challenges described above. 
    more » « less