Few-shot machine learning attempts to predict outputs given only a very small number of training examples. The key idea behind most few-shot learning approaches is to pre-train the model with a large number of instances from a different but related class of data, classes for which a large number of instances are available for training. Few-shot learning has been most successfully demonstrated for classification problems using Siamese deep learning neural networks. Few-shot learning is less extensively applied to time-series forecasting. Few-shot forecasting is the task of predicting future values of a time-series even when only a small set of historic time-series is available. Few-shot forecasting has applications in domains where a long history of data is not available. This work describes deep neural network architectures for few-shot forecasting. All the architectures use a Siamese twin network approach to learn a difference function between pairs of time-series, rather than directly forecasting based on historical data as seen in traditional forecasting models. The networks are built using Long short-term memory units (LSTM). During forecasting, a model is able to forecast time-series types that were never seen in the training data by using the few available instances of the new time-series type as reference inputs. The proposed architectures are evaluated on Vehicular traffic data collected in California from the Caltrans Performance Measurement System (PeMS). The models were trained with traffic flow data collected at specific locations and then are evaluated by predicting traffic at different locations at different time horizons (0 to 12 hours). The Mean Absolute Error (MAE) was used as the evaluation metric and also as the loss function for training. The proposed architectures show lower prediction error than a baseline nearest neighbor forecast model. The prediction error increases at longer time horizons.
more »
« less
This content will become publicly available on August 6, 2026
zSHIFT: A Siamese Hierarchical Transformer Network for Zero Shot Time Series Forecasting
Zero shot time series forecasting is the challenge of forecasting future values of a time dependent sequence without having access to any historical data from the target series during model training. This setting differs from the traditional domain of time series forecasting, where models are typically trained using large volumes of historical data, from the same distribution. Zero shot time series forecasting models are designed to generalize to unseen time series by leveraging their knowledge learned from other, similar series during training. This work proposes two architectures designed for zero shot time series forecasting: zSiFT and zSHiFT. Both architectures use transformer models arranged in a Siamese network configuration. The zSHiFT architecture differs from the zSiFT by the introduction of a hierarchical transformer component to the Siamese network. These architectures are evaluated on vehicular traffic data in California available from the Caltrans Performance Measurement System (PeMS). The models were trained with traffic flow data collected in one region of California and then are evaluated by forecasting traffic in other regions. Forecast accuracy was evaluated at different time horizons (4 to 48 hours). The zSiFT model achieves a Mean Absolute Error (MAE) that is 8.3% lower than the baseline LSTM with attention mechanism model. The zSiFT model achieves an MAE which is 6.6% lower than zSHiFT’s MAE.
more »
« less
- Award ID(s):
- 2125654
- PAR ID:
- 10647108
- Publisher / Repository:
- IEEE
- Date Published:
- Page Range / eLocation ID:
- 331 to 336
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
na (Ed.)Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize ”predictive analysis”, analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal (”few-shot”) or no fine-tuning (”zero-shot”). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets.more » « less
-
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities, remains crucial for reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive earthquake datasets. Despite significant advancements, the existing literature on earthquake nowcasting lacks comprehensive evaluations of pre-trained foundation models and modern deep learning architectures; each focuses on a different aspect of data, such as spatial relationships, temporal patterns, and multi-scale dependencies. This paper addresses the mentioned gap by analyzing different architectures and introducing two innovative approaches called Multi Foundation Quake and GNNCoder. We formulate earthquake nowcasting as a time series forecasting problem for the next 14 days within 0.1-degree spatial bins in Southern California. Earthquake time series are generated using the logarithm energy released by quakes, spanning 1986 to 2024. Our comprehensive evaluations demonstrate that our introduced models outperform other custom architectures by effectively capturing temporal-spatial relationships inherent in seismic data. The performance of existing foundation models varies significantly based on the pre-training datasets, emphasizing the need for careful dataset selection. However, we introduce a novel method, Multi Foundation Quake, that achieves the best overall performance by combining a bespoke pattern with Foundation model results handled as auxiliary streams.more » « less
-
Recent advancements in large language models have spurred significant developments in Time Series Foundation Models (TSFMs). These models claim great promise in performing zero-shot forecasting without the need for specific training, leveraging the extensive "corpus" of time-series data they have been trained on. Forecasting is crucial in predictive building analytics, presenting substantial untapped potential for TSFMS in this domain. However, time-series data are often domain-specific and governed by diverse factors such as deployment environments, sensor characteristics, sampling rate, and data resolution, which complicates generalizability of these models across different contexts. Thus, while language models benefit from the relative uniformity of text data, TSFMs face challenges in learning from heterogeneous and contextually varied time-series data to ensure accurate and reliable performance in various applications. This paper seeks to understand how recently developed TSFMs perform in the building domain, particularly concerning their generalizability. We benchmark these models on three large datasets related to indoor air temperature and electricity usage. Our results indicate that TSFMs exhibit marginally better performance compared to statistical models on unseen sensing modality and/or patterns. Based on the benchmark results, we also provide insights for improving future TSFMs on building analytics.more » « less
-
Representation Learning), a novel multimodal meta-learning framework for few-shot learning in heterogeneous systems, designed for science and engineering problems where entities share a common underlying forward model but exhibit heterogeneity due to entity-specific characteristics. TAM-RL leverages an amortized training process with a modulation network and a base network to learn task-specific modulation parameters, enabling efficient adaptation to new tasks with limited data. We evaluate TAM-RL on two real-world environmental datasets: Gross Primary Product (GPP) prediction and streamflow forecasting, demonstrating significant improvements over existing meta-learning methods. On the FLUXNET dataset, TAM-RL improves RMSE by 18.9% over MMAML with just one month of few-shot data, while for streamflow prediction, it achieves an 8.21% improvement with one year of data. Synthetic data experiments further validate TAM-RL’s superior performance in heterogeneous task distributions, outperforming the baselines in the most heterogeneous setting. Notably, TAM-RL offers substantial computational efficiency, with at least 3x faster training times compared to gradient-based meta-learning approaches while being much simpler to train due to reduced complexity. Ablation studies highlight the importance of pretraining and adaptation mechanisms in TAM-RL’s performance. Keywords: Representation Learning, meta-learning, few-shot learning, environmental applications, time-series. DOI:10.1137/1.9781611978520.2more » « less
An official website of the United States government
