skip to main content


Title: Wasserstein Learning of Deep Generative Point Process Models
Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model’s expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.  more » « less
Award ID(s):
1745382 1620342
NSF-PAR ID:
10190741
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying intensity functions are the current state-of-the-art for predictive modeling with such data. These models typically assume that all event sequences come from the same data distribution. However, in many applications event sequences are generated by different sources, or users, and their characteristics can be very different. In this paper, we extend the broad class of neural marked point process models to mixtures of latent embeddings, where each mixture component models the characteristic traits of a given user. Our approach relies on augmenting these models with a latent variable that encodes user characteristics, represented by a mixture model over user behavior that is trained via amortized variational inference. We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification. 
    more » « less
  2. Abstract

    We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples.

     
    more » « less
  3. Estimating the future event sequence conditioned on current observations is a long-standing and challenging task in temporal analysis. On one hand for many real-world problems the underlying dynamics can be very complex and often unknown. This renders the traditional parametric point process models often fail to fit the data for their limited capacity. On the other hand, long-term prediction suffers from the problem of bias exposure where the error accumulates and propagates to future prediction. Our new model builds upon the sequence to sequence (seq2seq) prediction network. Compared with parametric point process models, its modeling capacity is higher and has better flexibility for fitting real-world data. The main novelty of the paper is to mitigate the second challenge by introducing the likelihood-free loss based on Wasserstein distance between point processes, besides negative maximum likelihood loss used in the traditional seq2seq model. Wasserstein distance, unlike KL divergence i.e. MLE loss, is sensitive to the underlying geometry between samples and can robustly enforce close geometry structure between them. This technique is proven able to improve the vanilla seq2seq model by a notable margin on various tasks. 
    more » « less
  4. Morrison, Abigail (Ed.)
    Assessing directional influences between neurons is instrumental to understand how brain circuits process information. To this end, Granger causality, a technique originally developed for time-continuous signals, has been extended to discrete spike trains. A fundamental assumption of this technique is that the temporal evolution of neuronal responses must be due only to endogenous interactions between recorded units, including self-interactions. This assumption is however rarely met in neurophysiological studies, where the response of each neuron is modulated by other exogenous causes such as, for example, other unobserved units or slow adaptation processes. Here, we propose a novel point-process Granger causality technique that is robust with respect to the two most common exogenous modulations observed in real neuronal responses: within-trial temporal variations in spiking rate and between-trial variability in their magnitudes. This novel method works by explicitly including both types of modulations into the generalized linear model of the neuronal conditional intensity function (CIF). We then assess the causal influence of neuron i onto neuron j by measuring the relative reduction of neuron j ’s point process likelihood obtained considering or removing neuron i . CIF’s hyper-parameters are set on a per-neuron basis by minimizing Akaike’s information criterion. In synthetic data sets, generated by means of random processes or networks of integrate-and-fire units, the proposed method recovered with high accuracy, sensitivity and robustness the underlying ground-truth connectivity pattern. Application of presently available point-process Granger causality techniques produced instead a significant number of false positive connections. In real spiking responses recorded from neurons in the monkey pre-motor cortex (area F5), our method revealed many causal relationships between neurons as well as the temporal structure of their interactions. Given its robustness our method can be effectively applied to real neuronal data. Furthermore, its explicit estimate of the effects of unobserved causes on the recorded neuronal firing patterns can help decomposing their temporal variations into endogenous and exogenous components. 
    more » « less
  5. Abstract

    Recent investigations have revealed that dynamics of complex networks and systems are crucially dependent on the temporal structures. Accurate detection of the time instant at which a system changes its internal structures has become a tremendously significant mission, beneficial to fully understanding the underlying mechanisms of evolving systems, and adequately modeling and predicting the dynamics of the systems as well. In real-world applications, due to a lack of prior knowledge on the explicit equations of evolving systems, an open challenge is how to develop a practical and model-free method to achieve the mission based merely on the time-series data recorded from real-world systems. Here, we develop such a model-free approach, named temporal change-point detection (TCD), and integrate both dynamical and statistical methods to address this important challenge in a novel way. The proposed TCD approach, basing on exploitation of spatial information of the observed time series of high dimensions, is able not only to detect the separate change points of the concerned systems without knowing, a priori, any information of the equations of the systems, but also to harvest all the change points emergent in a relatively high-frequency manner, which cannot be directly achieved by using the existing methods and techniques. Practical effectiveness is comprehensively demonstrated using the data from the representative complex dynamics and real-world systems from biology to geology and even to social science.

     
    more » « less