This content will become publicly available on December 4, 2024
- Award ID(s):
- 2313174
- NSF-PAR ID:
- 10511798
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE ICDM 2024, 23rd IEEE International Conference on Data Mining
- Subject(s) / Keyword(s):
- Recurrent neural networks Training data Machine learning Predictive models Data Models Robustness Meteorology
- Format(s):
- Medium: X
- Location:
- Shanghai, China
- Sponsoring Org:
- National Science Foundation
More Like this
-
Accurate long-term predictions are the foundations for many machine learning applications and decision-making processes. However, building accurate long-term prediction models remains challenging due to the limitations of existing temporal models like recurrent neural networks (RNNs), as they capture only the statistical connections in the training data and may fail to learn the underlying dynamics of the target system. To tackle this challenge, we propose a novel machine learning model based on Koopman operator theory, which we call Koopman Invertible Autoencoders (KIA), that captures the inherent characteristic of the system by modeling both forward and backward dynamics in the infinite-dimensional Hilbert space. This enables us to efficiently learn low-dimensional representations, resulting in more accurate predictions of long-term system behavior. Moreover, our method’s invertibility design enforces reversibility and consistency in both forward and inverse operations. We illustrate the utility of KIA on pendulum and climate datasets, demonstrating 300% improvements in long-term prediction capability for pendulum while maintaining robustness against noise. Additionally, our method demonstrates the ability to better comprehend the intricate dynamics of the climate system when compared to existing Koopman-based methods.more » « less
-
Recent work has shown that machine learning (ML) models can be trained to accurately forecast the dynamics of unknown chaotic dynamical systems. Short-term predictions of the state evolution and long-term predictions of the statistical patterns of the dynamics (``climate'') can be produced by employing a feedback loop, whereby the model is trained to predict forward one time step, then the model output is used as input for multiple time steps. In the absence of mitigating techniques, however, this technique can result in artificially rapid error growth. In this article, we systematically examine the technique of adding noise to the ML model input during training to promote stability and improve prediction accuracy. Furthermore, we introduce Linearized Multi-Noise Training (LMNT), a regularization technique that deterministically approximates the effect of many small, independent noise realizations added to the model input during training. Our case study uses reservoir computing, a machine-learning method using recurrent neural networks, to predict the spatiotemporal chaotic Kuramoto-Sivashinsky equation. We find that reservoir computers trained with noise or with LMNT produce climate predictions that appear to be indefinitely stable and have a climate very similar to the true system, while reservoir computers trained without regularization are unstable. Compared with other regularization techniques that yield stability in some cases, we find that both short-term and climate predictions from reservoir computers trained with noise or with LMNT are substantially more accurate. Finally, we show that the deterministic aspect of our LMNT regularization facilitates fast hyperparameter tuning when compared to training with noise.more » « less
-
Abstract Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
-
Accurate predictions of water temperature are the foundation for many decisions and regulations, with direct impacts on water quality, fishery yields, and power production. Building accurate broad-scale models for lake temperature prediction remains challenging in practice due to the variability in the data distribution across different lake systems monitored by static and time-series data. In this paper, to tackle the above challenges, we propose a novel machine learning based approach for integrating static and time-series data in deep recurrent models, which we call Invertibility-Aware-Long Short-Term Memory(IA-LSTM), and demonstrate its effectiveness in predicting lake temperature. Our proposed method integrates components of the Invertible Network and LSTM to better predict temperature profiles (forward modeling) and infer the static features (i.e., inverse modeling) that can eventually enhance the prediction when static variables are missing. We evaluate our method on predicting the temperature profile of 450 lakes in the Midwestern U.S. and report a relative improvement of 4\% to capture data heterogeneity and simultaneously outperform baseline predictions by 12\% when static features are unavailable.more » « less
-
Base metal electrode (BME) multilayer ceramic capacitors (MLCCs) are widely used in aerospace, medical, military, and communication applications, emphasizing the need for high reliability. The ongoing advancements in BaTiO3-based MLCC technology have facilitated further miniaturization and improved capacitive volumetric density for both low and high voltage devices. However, concerns persist regarding infant mortality failures and long-term reliability under higher fields and temperatures. To address these concerns, a comprehensive understanding of the mechanisms underlying insulation resistance degradation is crucial. Furthermore, there is a need to develop effective screening procedures during MLCC production and improve the accuracy of mean time to failure (MTTF) predictions. This article reviews our findings on the effect of the burn-in test, a common quality control process, on the dynamics of oxygen vacancies within BME MLCCs. These findings reveal the burn-in test has a negative impact on the lifetime and reliability of BME MLCCS. Moreover, the limitations of existing lifetime prediction models for BME MLCCs are discussed, emphasizing the need for improved MTTF predictions by employing a physics-based machine learning model to overcome the existing models’ limitations. The article also discusses the new physical-based machine learning model that has been developed. While data limitations remain a challenge, the physics-based machine learning approach offers promising results for MTTF prediction in MLCCs, contributing to improved lifetime predictions. Furthermore, the article acknowledges the limitations of relying solely on MTTF to predict MLCCs’ lifetime and emphasizes the importance of developing comprehensive prediction models that predict the entire distribution of failures.more » « less