skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transformer Hawkes Process
Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets. These data often ex- hibit complicated short-term and long-term temporal dependencies. However, most of the ex- isting recurrent neural network-based point process models fail to capture such dependencies, and yield unreliable prediction performance. To address this issue, we propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long- term dependencies and meanwhile enjoys computational efficiency. Numerical experiments on various datasets show that THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. Moreover, THP is quite general and can incorpo- rate additional structural knowledge. We provide a concrete example, where THP achieves im- proved prediction performance for learning multiple point processes when incorporating their relational information.  more » « less
Award ID(s):
1717916
PAR ID:
10162616
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large quantities of asynchronous event sequence data such as crime records, emergence call logs, and financial transactions are becoming increasingly available from various fields. These event sequences often exhibit both long-term and short-term temporal dependencies. Variations of neural network based temporal point processes have been widely used for modeling such asynchronous event sequences. However, many current architectures including attention based point processes struggle with long event sequences due to computational inefficiency. To tackle the challenge, we propose an efficient sparse transformer Hawkes process (STHP), which has two components. For the first component, a transformer with a novel temporal sparse self-attention mechanism is applied to event sequences with arbitrary intervals, mainly focusing on short-term dependencies. For the second component, a transformer is applied to the time series of aggregated event counts, primarily targeting the extraction of long-term periodic dependencies. Both components complement each other and are fused together to model the conditional intensity function of a point process for future event forecasting. Experiments on real-world datasets show that the proposed STHP outperforms baselines and achieves significant improvement in computational efficiency without sacrificing prediction performance for long sequences. 
    more » « less
  2. Estimating the future event sequence conditioned on current observations is a long-standing and challenging task in temporal analysis. On one hand for many real-world problems the underlying dynamics can be very complex and often unknown. This renders the traditional parametric point process models often fail to fit the data for their limited capacity. On the other hand, long-term prediction suffers from the problem of bias exposure where the error accumulates and propagates to future prediction. Our new model builds upon the sequence to sequence (seq2seq) prediction network. Compared with parametric point process models, its modeling capacity is higher and has better flexibility for fitting real-world data. The main novelty of the paper is to mitigate the second challenge by introducing the likelihood-free loss based on Wasserstein distance between point processes, besides negative maximum likelihood loss used in the traditional seq2seq model. Wasserstein distance, unlike KL divergence i.e. MLE loss, is sensitive to the underlying geometry between samples and can robustly enforce close geometry structure between them. This technique is proven able to improve the vanilla seq2seq model by a notable margin on various tasks. 
    more » « less
  3. Time series forecasting with additional spatial information has attracted a tremendous amount of attention in recent research, due to its importance in various real-world applications on social studies, such as conflict prediction and pandemic forecasting. Conventional machine learning methods either consider temporal dependencies only, or treat spatial and temporal relations as two separate autoregressive models, namely, space-time autoregressive models. Such methods suffer when it comes to long-term forecasting or predictions for large-scale areas, due to the high nonlinearity and complexity of spatio-temporal data. In this paper, we propose to address these challenges using spatio-temporal graph neural networks. Empirical results on Violence Early Warning System (ViEWS) dataset and U.S. Covid-19 dataset indicate that our method significantly improved performance over the baseline approaches. 
    more » « less
  4. Abstract Monitoring the health condition as well as predicting the performance of Lithium-ion batteries are crucial to the reliability and safety of electrical systems such as electric vehicles. However, estimating the discharge capacity and end-of-discharge (EOD) of a battery in real-time remains a challenge. Few works have been reported on the relationship between the capacity degradation of a battery and EOD. We introduce a new data-driven method that combines convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) models to predict the discharge capacity and the EOD using online condition monitoring data. The CNN model extracts long-term correlations among voltage, current, and temperature measurements and then estimates the discharge capacity. The BiLSTM model extracts short-term dependencies in condition monitoring data and predicts the EOD for each discharge cycle while utilizing the capacity predicted by CNN as an additional input. By considering the discharge capacity, the BiLSTM model is able to use the long-term health condition of a battery to improve the prediction accuracy of its short-term performance. We demonstrated that the proposed method can achieve online discharge capacity estimation and EOD prediction efficiently and accurately. 
    more » « less
  5. Karst aquifers are important groundwater resources that supply drinking water for approximately 25 % of the world’s population. Their complex hydrogeological structures, dual-flow regimes, and highly heterogeneous flow pose significant challenges for accurate hydrodynamic modeling and sustainable management. Traditional modeling approaches often struggle to capture the intricate spatial dependencies and multi-scale temporal patterns inherent in karst systems, particularly the interactions between rapid conduit flow and slower matrix flow. This study proposes a novel multi-scale dynamic graph attention network integrated with long short-term memory model (GAT-LSTM) to innovatively learn and integrate spatial and temporal dependencies in karst systems for forecasting spring discharge. The model introduces several innovative components: (1) graph-based neural networks with dynamic edge-weighting mechanism are proposed to learn and update spatial dependencies based on both geographic distances and learned hydrological relationships, (2) a multi-head attention mechanism is adopted to capture different aspects of spatial relationships simultaneously, and (3) a hierarchical temporal architecture is incorporated to process hydrological temporal patterns at both monthly and seasonal scales with an adaptive fusion mechanism for final results. These features enable the proposed model to effectively account for the dual-flow dynamics in karst systems, where rapid conduit flow and slower matrix flow coexist. The newly proposed model is applied to the Barton Springs of the Edwards Aquifer in Texas. The results demonstrate that it can obtain more accurate and robust prediction performance across various time steps compared to traditional temporal and spatial deep learning approaches. Based on the multi-scale GAT-LSTM model, a comprehensive ablation analysis and permutation feature important are conducted to analyze the relative contribution of various input variables on the final prediction. These findings highlight the intricate nature of karst systems and demonstrate that effective spring discharge prediction requires comprehensive monitoring networks encompassing both primary recharge contributors and supplementary hydrological features that may serve as valuable indicators of system-wide conditions. 
    more » « less