Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model’s expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.
more »
« less
Learning Conditional Generative Models for Temporal Point Processes
Estimating the future event sequence conditioned on current observations is a long-standing and challenging task in temporal analysis. On one hand for many real-world problems the underlying dynamics can be very complex and often unknown. This renders the traditional parametric point process models often fail to fit the data for their limited capacity. On the other hand, long-term prediction suffers from the problem of bias exposure where the error accumulates and propagates to future prediction. Our new model builds upon the sequence to sequence (seq2seq) prediction network. Compared with parametric point process models, its modeling capacity is higher and has better flexibility for fitting real-world data. The main novelty of the paper is to mitigate the second challenge by introducing the likelihood-free loss based on Wasserstein distance between point processes, besides negative maximum likelihood loss used in the traditional seq2seq model. Wasserstein distance, unlike KL divergence i.e. MLE loss, is sensitive to the underlying geometry between samples and can robustly enforce close geometry structure between them. This technique is proven able to improve the vanilla seq2seq model by a notable margin on various tasks.
more »
« less
- Award ID(s):
- 1745382
- PAR ID:
- 10190672
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- ISSN:
- 2159-5399
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a “registered” point process that accounts for shared structure, as well as “warping” functions that characterize idiosyncratic aspects of each observed sequence. Under reasonable constraints, in each iteration we update the sample-specific warping functions by solving a set of constrained nonlinear programming problems in parallel, and update the model by maximum likelihood estimation. The justifiability, complexity and robustness of the proposed method are investigated in detail, and the influence of sequence stitching on the learning results is discussed empirically. Experiments on both synthetic and real-world data demonstrate that the method yields explainable point process models, achieving encouraging results compared to state-of-the-art methods.more » « less
-
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide stateof-the-art performance in a wide variety of tasks, such as machine translation, headline generation, text summarization, speech-to-text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder–decoder models produce competitive results, many researchers have proposed additional improvements over these seq2seq models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with seq2seq models that enable remembering long-term memories. We present some of the most recent frameworks that combine the concepts from RL and deep neural networks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provide some targeted experiments for these RL models, both in terms of performance and training time.more » « less
-
Forecasting citations of scientific patents and publications is a crucial task for understanding the evolution and development of technological domains and for foresight into emerging technologies. By construing citations as a time series, the task can be cast into the domain of temporal point processes. Most existing work on forecasting with temporal point processes, both conventional and neural network-based, only performs single-step forecasting. In citation forecasting, however, the more salient goal isn-step forecasting: predicting the arrival of the nextncitations. In this article, we propose Dynamic Multi-Context Attention Networks (DMA-Nets), a novel deep learning sequence-to-sequence (Seq2Seq) model with a novel hierarchical dynamic attention mechanism for long-term citation forecasting. Extensive experiments on two real-world datasets demonstrate that the proposed model learns better representations of conditional dependencies over historical sequences compared to state-of-the-art counterparts and thus achieves significant performance for citation predictions.more » « less
-
Abstract Modeling temporal event sequences on the vertices of a network is an important problem with widespread applications; examples include modeling influences in social networks, preventing crimes by modeling their space–time occurrences, and forecasting earthquakes. Existing solutions for this problem use a parametric approach, whose applicability is limited to event sequences following some well‐known distributions, which is not true for many real life event datasets. To overcome this limitation, in this work, we propose a composite recurrent neural network model for learning events occurring in the vertices of a network over time. Our proposed model combines two long short‐term memory units to capture base intensity and conditional intensity of an event sequence. We also introduce a second‐order statistic loss that penalizes higher divergence between the generated and the target sequence's distribution of hop count distance of consecutive events. Given a sequence of vertices of a network in which an event has occurred, the proposed model predicts the vertex where the next event would most likely occur. Experimental results on synthetic and real‐world datasets validate the superiority of our proposed model in comparison to various baseline methods.more » « less
An official website of the United States government

