Large quantities of asynchronous event sequence data such as crime records, emergence call logs, and financial transactions are becoming increasingly available from various fields. These event sequences often exhibit both long-term and short-term temporal dependencies. Variations of neural network based temporal point processes have been widely used for modeling such asynchronous event sequences. However, many current architectures including attention based point processes struggle with long event sequences due to computational inefficiency. To tackle the challenge, we propose an efficient sparse transformer Hawkes process (STHP), which has two components. For the first component, a transformer with a novel temporal sparse self-attention mechanism is applied to event sequences with arbitrary intervals, mainly focusing on short-term dependencies. For the second component, a transformer is applied to the time series of aggregated event counts, primarily targeting the extraction of long-term periodic dependencies. Both components complement each other and are fused together to model the conditional intensity function of a point process for future event forecasting. Experiments on real-world datasets show that the proposed STHP outperforms baselines and achieves significant improvement in computational efficiency without sacrificing prediction performance for long sequences. 
                        more » 
                        « less   
                    
                            
                            TimeMachine: A Time Series is Worth 4 Mambas for Long-Term Forecasting
                        
                    
    
            Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footprints. TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales and leverage an innovative integrated quadruple-Mamba architecture to unify the handling of channel-mixing and channel-independence situations, thus enabling effective selection of contents for prediction against global and local contexts at different scales. Experimentally, TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets. Code availability: https://github.com/Atik-Ahamed/TimeMachine 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10632492
- Publisher / Repository:
- IOS Press
- Date Published:
- ISBN:
- 978-1-64368-548-9
- Subject(s) / Keyword(s):
- multivariate time series, long term time series forecasting, deep learning, Mamba model
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly as the effective memory required grows with sequence length. The recent development of the Mamba state space model (SSM) offers an appealing alternative approach with a fixed-sized memory state and efficient decoding. We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences. In terms of modeling, we show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free language models, such as robustness to noise. In terms of efficiency, we develop an adaptation of speculative decoding with tokenized drafting and byte-level verification. This results in a 2.6× inference speedup to the standard MambaByte implementation, showing similar decoding efficiency as the subword Mamba. These findings establish the viability of SSMs in enabling token-free language modeling.more » « less
- 
            Karst aquifers are important groundwater resources that supply drinking water for approximately 25 % of the world’s population. Their complex hydrogeological structures, dual-flow regimes, and highly heterogeneous flow pose significant challenges for accurate hydrodynamic modeling and sustainable management. Traditional modeling approaches often struggle to capture the intricate spatial dependencies and multi-scale temporal patterns inherent in karst systems, particularly the interactions between rapid conduit flow and slower matrix flow. This study proposes a novel multi-scale dynamic graph attention network integrated with long short-term memory model (GAT-LSTM) to innovatively learn and integrate spatial and temporal dependencies in karst systems for forecasting spring discharge. The model introduces several innovative components: (1) graph-based neural networks with dynamic edge-weighting mechanism are proposed to learn and update spatial dependencies based on both geographic distances and learned hydrological relationships, (2) a multi-head attention mechanism is adopted to capture different aspects of spatial relationships simultaneously, and (3) a hierarchical temporal architecture is incorporated to process hydrological temporal patterns at both monthly and seasonal scales with an adaptive fusion mechanism for final results. These features enable the proposed model to effectively account for the dual-flow dynamics in karst systems, where rapid conduit flow and slower matrix flow coexist. The newly proposed model is applied to the Barton Springs of the Edwards Aquifer in Texas. The results demonstrate that it can obtain more accurate and robust prediction performance across various time steps compared to traditional temporal and spatial deep learning approaches. Based on the multi-scale GAT-LSTM model, a comprehensive ablation analysis and permutation feature important are conducted to analyze the relative contribution of various input variables on the final prediction. These findings highlight the intricate nature of karst systems and demonstrate that effective spring discharge prediction requires comprehensive monitoring networks encompassing both primary recharge contributors and supplementary hydrological features that may serve as valuable indicators of system-wide conditions.more » « less
- 
            Abstract Monitoring the health condition as well as predicting the performance of Lithium-ion batteries are crucial to the reliability and safety of electrical systems such as electric vehicles. However, estimating the discharge capacity and end-of-discharge (EOD) of a battery in real-time remains a challenge. Few works have been reported on the relationship between the capacity degradation of a battery and EOD. We introduce a new data-driven method that combines convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) models to predict the discharge capacity and the EOD using online condition monitoring data. The CNN model extracts long-term correlations among voltage, current, and temperature measurements and then estimates the discharge capacity. The BiLSTM model extracts short-term dependencies in condition monitoring data and predicts the EOD for each discharge cycle while utilizing the capacity predicted by CNN as an additional input. By considering the discharge capacity, the BiLSTM model is able to use the long-term health condition of a battery to improve the prediction accuracy of its short-term performance. We demonstrated that the proposed method can achieve online discharge capacity estimation and EOD prediction efficiently and accurately.more » « less
- 
            Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets. These data often ex- hibit complicated short-term and long-term temporal dependencies. However, most of the ex- isting recurrent neural network-based point process models fail to capture such dependencies, and yield unreliable prediction performance. To address this issue, we propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long- term dependencies and meanwhile enjoys computational efficiency. Numerical experiments on various datasets show that THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. Moreover, THP is quite general and can incorpo- rate additional structural knowledge. We provide a concrete example, where THP achieves im- proved prediction performance for learning multiple point processes when incorporating their relational information.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    