Abstract Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models, if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lot of training data which may not be available from reanalysis (observational data) products. Moreover, an accurate, noise-free, initial condition to start forecasting with a data-driven weather model is not available in realistic scenarios. Finally, deterministic data-driven forecasting models suffer from issues with long-term stability and unphysical climate drift, which makes these data-driven models unsuitable for computing climate statistics. Given these challenges, previous studies have tried to pre-train deep learning-based weather forecasting models on a large amount of imperfect long-term climate model simulations and then re-train them on available observational data. In this article, we propose a convolutional variational autoencoder (VAE)-based stochastic data-driven model that is pre-trained on an imperfect climate model simulation from a two-layer quasi-geostrophic flow and re-trained, using transfer learning, on a small number of noisy observations from a perfect simulation. This re-trained model then performs stochastic forecasting with a noisy initial condition sampled from the perfect simulation. We show that our ensemble-based stochastic data-driven model outperforms a baseline deterministic encoder–decoder-based convolutional model in terms of short-term skills, while remaining stable for long-term climate simulations yielding accurate climatology.
more »
« less
Pole balancing on the fingertip: model-motivated machine learning forecasting of falls
Introduction:There is increasing interest in developing mathematical and computational models to forecast adverse events in physiological systems. Examples include falls, the onset of fatal cardiac arrhythmias, and adverse surgical outcomes. However, the dynamics of physiological systems are known to be exceedingly complex and perhaps even chaotic. Since no model can be perfect, it becomes important to understand how forecasting can be improved, especially when training data is limited. An adverse event that can be readily studied in the laboratory is the occurrence of stick falls when humans attempt to balance a stick on their fingertips. Over the last 20 years, this task has been extensively investigated experimentally, and presently detailed mathematical models are available. Methods:Here we use a long short-term memory (LTSM) deep learning network to forecast stick falls. We train this model to forecast stick falls in three ways: 1) using only data generated by the mathematical model (synthetic data), 2) using only stick balancing recordings of stick falls measured using high-speed motion capture measurements (human data), and 3) using transfer learning which combines a model trained using synthetic data plus a small amount of human balancing data. Results:We observe that the LTSM model is much more successful in forecasting a fall using synthetic data than it is in forecasting falls for models trained with limited available human data. However, with transfer learning, i.e., the LTSM model pre-trained with synthetic data and re-trained with a small amount of real human balancing data, the ability to forecast impending falls in human data is vastly improved. Indeed, it becomes possible to correctly forecast 60%–70% of real human stick falls up to 2.35 s in advance. Conclusion:These observations support the use of model-generated data and transfer learning techniques to improve the ability of computational models to forecast adverse physiological events.
more »
« less
- Award ID(s):
- 2123749
- PAR ID:
- 10532858
- Publisher / Repository:
- Frontiers
- Date Published:
- Journal Name:
- Frontiers in Physiology
- Volume:
- 15
- ISSN:
- 1664-042X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Falls in the elderly are associated with significant morbidity and mortality. While numerous fall detection devices incorporating AI and machine learning algorithms have been developed, no known smartwatch-based system has been used successfully in real-time to detect falls for elderly persons. We have developed and deployed a SmartFall system on a commodity-based smartwatch which has been trialled by nine elderly participants. The system, while being usable and welcomed by the participants in our trials, has two serious limitations. The first limitation is the inability to collect a large amount of personalized data for training. When the fall detection model, which is trained with insufficient data, is used in the real world, it generates a large amount of false positives. The second limitation is the model drift problem. This means an accurate model trained using data collected with a specific device performs sub-par when used in another device. Therefore, building one model for each type of device/watch is not a scalable approach for developing smartwatch-based fall detection system. To tackle those issues, we first collected three datasets including accelerometer data for fall detection problem from different devices: the Microsoft watch (MSBAND), the Huawei watch, and the meta-sensor device. After that, a transfer learning strategy was applied to first explore the use of transfer learning to overcome the small dataset training problem for fall detection. We also demonstrated the use of transfer learning to generalize the model across the heterogeneous devices. Our preliminary experiments demonstrate the effectiveness of transfer learning for improving fall detection, achieving an F1 score higher by over 10% on average, an AUC higher by over 0.15 on average, and a smaller false positive prediction rate than the non-transfer learning approach across various datasets collected using different devices with different hardware specifications.more » « less
-
Few-shot machine learning attempts to predict outputs given only a very small number of training examples. The key idea behind most few-shot learning approaches is to pre-train the model with a large number of instances from a different but related class of data, classes for which a large number of instances are available for training. Few-shot learning has been most successfully demonstrated for classification problems using Siamese deep learning neural networks. Few-shot learning is less extensively applied to time-series forecasting. Few-shot forecasting is the task of predicting future values of a time-series even when only a small set of historic time-series is available. Few-shot forecasting has applications in domains where a long history of data is not available. This work describes deep neural network architectures for few-shot forecasting. All the architectures use a Siamese twin network approach to learn a difference function between pairs of time-series, rather than directly forecasting based on historical data as seen in traditional forecasting models. The networks are built using Long short-term memory units (LSTM). During forecasting, a model is able to forecast time-series types that were never seen in the training data by using the few available instances of the new time-series type as reference inputs. The proposed architectures are evaluated on Vehicular traffic data collected in California from the Caltrans Performance Measurement System (PeMS). The models were trained with traffic flow data collected at specific locations and then are evaluated by predicting traffic at different locations at different time horizons (0 to 12 hours). The Mean Absolute Error (MAE) was used as the evaluation metric and also as the loss function for training. The proposed architectures show lower prediction error than a baseline nearest neighbor forecast model. The prediction error increases at longer time horizons.more » « less
-
Abstract The fast simulation of dynamical systems is a key challenge in many scientific and engineering applications, such as weather forecasting, disease control, and drug discovery. With the recent success of deep learning, there is increasing interest in using neural networks to solve differential equations in a data‐driven manner. However, existing methods are either limited to specific types of differential equations or require large amounts of data for training. This restricts their practicality in many real‐world applications, where data is often scarce or expensive to obtain. To address this, a novel multi‐modal foundation model, namedFMint(FoundationModel based onInitialization) is proposed, to bridge the gap between human‐designed and data‐driven models for the fast simulation of dynamical systems. Built on a decoder‐only transformer architecture with in‐context learning, FMint utilizes both numerical and textual data to learn a universal error correction scheme for dynamical systems, using prompted sequences of coarse solutions from traditional solvers. The model is pre‐trained on a corpus of 400K ordinary differential equations (ODEs), and extensive experiments are performed on challenging ODEs that exhibit chaotic behavior and of high dimensionality. The results demonstrate the effectiveness of the proposed model in terms of both accuracy and efficiency compared to classical numerical solvers, highlighting FMint's potential as a general‐purpose solver for dynamical systems. This approach achieves an accuracy improvement of 1 to 2 orders of magnitude over state‐of‐the‐art dynamical system simulators, and delivers a 5X speedup compared to traditional numerical algorithms. The code for FMint is available athttps://github.com/margotyjx/FMint.more » « less
-
Waldemar Karwowski (Ed.)Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error.more » « less
An official website of the United States government

