Large quantities of asynchronous event sequence data such as crime records, emergence call logs, and financial transactions are becoming increasingly available from various fields. These event sequences often exhibit both long-term and short-term temporal dependencies. Variations of neural network based temporal point processes have been widely used for modeling such asynchronous event sequences. However, many current architectures including attention based point processes struggle with long event sequences due to computational inefficiency. To tackle the challenge, we propose an efficient sparse transformer Hawkes process (STHP), which has two components. For the first component, a transformer with a novel temporal sparse self-attention mechanism is applied to event sequences with arbitrary intervals, mainly focusing on short-term dependencies. For the second component, a transformer is applied to the time series of aggregated event counts, primarily targeting the extraction of long-term periodic dependencies. Both components complement each other and are fused together to model the conditional intensity function of a point process for future event forecasting. Experiments on real-world datasets show that the proposed STHP outperforms baselines and achieves significant improvement in computational efficiency without sacrificing prediction performance for long sequences.
more »
« less
TimeMachine: A Time Series is Worth 4 Mambas for Long-Term Forecasting
Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footprints. TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales and leverage an innovative integrated quadruple-Mamba architecture to unify the handling of channel-mixing and channel-independence situations, thus enabling effective selection of contents for prediction against global and local contexts at different scales. Experimentally, TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets. Code availability: https://github.com/Atik-Ahamed/TimeMachine
more »
« less
- PAR ID:
- 10632492
- Publisher / Repository:
- IOS Press
- Date Published:
- ISBN:
- 978-1-64368-548-9
- Subject(s) / Keyword(s):
- multivariate time series, long term time series forecasting, deep learning, Mamba model
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this work, we propose a new efficient solution, which is a Mamba-based model named BMACE (Bidirectional Mamba-based network, for Automatic Chord Estimation), which utilizes selective structured state-space models in a bidirectional Mamba layer to effectively model temporal dependencies. Our model achieves high prediction performance comparable to state-of-the-art models, with the advantage of requiring fewer parameters and lower computational resources.more » « less
-
Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly as the effective memory required grows with sequence length. The recent development of the Mamba state space model (SSM) offers an appealing alternative approach with a fixed-sized memory state and efficient decoding. We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences. In terms of modeling, we show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free language models, such as robustness to noise. In terms of efficiency, we develop an adaptation of speculative decoding with tokenized drafting and byte-level verification. This results in a 2.6× inference speedup to the standard MambaByte implementation, showing similar decoding efficiency as the subword Mamba. These findings establish the viability of SSMs in enabling token-free language modeling.more » « less
-
This paper presents an innovative approach to wind energy forecasting through the implementation of an extended long short-term memory (xLSTM) model. This research addresses fundamental limitations in time-sequence forecasting for wind energy by introducing architectural enhancements to traditional LSTM networks. The xLSTM model incorporates two key innovations: exponential gating with memory mixing and a novel matrix memory structure. These improvements are realized through two variants, i.e., scalar LSTM and matrix LSTM, which are integrated into residual blocks to form comprehensive architectures. The xLSTM model was validated using SCADA data from wind turbines, with rigorous preprocessing to remove anomalous measurements. Performance evaluation across different wind speed regimes demonstrated robust predictive capabilities, with the xLSTM model achieving an overall coefficient of determination value of 0.923 and a mean absolute percentage error of 8.47%. Seasonal analysis revealed consistent prediction accuracy across varied meteorological patterns. The xLSTM model maintains linear computational complexity with respect to sequence length while offering enhanced capabilities in memory retention, state tracking, and long-range dependency modeling. These results demonstrate the potential of xLSTM for improving wind power forecasting accuracy, which is crucial for optimizing turbine operations and grid integration of renewable energy resources.more » « less
-
Abstract Milling is a critical manufacturing process to produce high-value components in aerospace, tooling, and automotive industries. However, milling is prone to chatter, a severe vibration that damages surface quality, cutting tools, and machines. Traditional experimental and mechanistic methods of chatter prediction have significant limitations. This study presents a data-driven machine learning (ML) model to predict and quantify milling chatter directly based on time-series vibration data. Three ML models, including hybrid long short-term memory (LSTM)—fully convolutional network (FCN) model, gated recurrent unit (GRU)—FCN model, and temporal convolutional network (TCN) models, have been developed and verified by incorporating milling parameters to enhance prediction accuracy and stability. Among the proposed models, the best-performing ML model (GRU-FCN) demonstrates strong performance in chatter prediction and severity quantification, providing actionable insights with improved computational efficiency. The integration of milling parameters into the ML model notably enhances the prediction accuracy and stability, proving particularly effective in real-time monitoring scenarios.more » « less
An official website of the United States government

