Abstract Predictions of hydrologic variables across the entire water cycle have significant value for water resources management as well as downstream applications such as ecosystem and water quality modeling. Recently, purely data‐driven deep learning models like long short‐term memory (LSTM) showed seemingly insurmountable performance in modeling rainfall runoff and other geoscientific variables, yet they cannot predict untrained physical variables and remain challenging to interpret. Here, we show that differentiable, learnable, process‐based models (calledδmodels here) can approach the performance level of LSTM for the intensively observed variable (streamflow) with regionalized parameterization. We use a simple hydrologic model HBV as the backbone and use embedded neural networks, which can only be trained in a differentiable programming framework, to parameterize, enhance, or replace the process‐based model's modules. Without using an ensemble or post‐processor,δmodels can obtain a median Nash‐Sutcliffe efficiency of 0.732 for 671 basins across the USA for the Daymet forcing data set, compared to 0.748 from a state‐of‐the‐art LSTM model with the same setup. For another forcing data set, the difference is even smaller: 0.715 versus 0.722. Meanwhile, the resulting learnable process‐based models can output a full set of untrained variables, for example, soil and groundwater storage, snowpack, evapotranspiration, and baseflow, and can later be constrained by their observations. Both simulated evapotranspiration and fraction of discharge from baseflow agreed decently with alternative estimates. The general framework can work with models with various process complexity and opens up the path for learning physics from big data.
more »
« less
Differentiable Model Selection for Ensemble Learning
Model selection is a strategy aimed at creating accurate and robust models by identifying the optimal model for classifying any particular input sample. This paper proposes a novel framework for differentiable selection of groups of models by integrating machine learning and combinatorial optimization.The framework is tailored for ensemble learning with a strategy that learns to combine the predictions of appropriately selected pre-trained ensemble models. It does so by modeling the ensemble learning task as a differentiable selection program trained end-to-end over a pretrained ensemble to optimize task performance. The proposed framework demonstrates its versatility and effectiveness, outperforming conventional and advanced consensus rules across a variety of classification tasks.
more »
« less
- PAR ID:
- 10451190
- Date Published:
- Journal Name:
- International Joint Conference on Artificial Intelligence
- Page Range / eLocation ID:
- 1954 to 1962
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Solar energetic particle (SEP) events, in particular high-energy-range SEP events, pose significant risks to space missions, astronauts, and technological infrastructure. Accurate prediction of these high-impact events is crucial for mitigating potential hazards. In this study, we present an end-to-end ensemble machine learning (ML) framework for the prediction of high-impact ∼100 MeV SEP events. Our approach leverages diverse data modalities sourced from the Solar and Heliospheric Observatory and the Geostationary Operational Environmental Satellite integrating extracted active region polygons from solar extreme ultraviolet (EUV) imagery, time-series proton flux measurements, sunspot activity data, and detailed active region characteristics. To quantify the predictive contribution of each data modality (e.g., EUV or time series), we independently evaluate them using a range of ML models to assess their performance in forecasting SEP events. Finally, to enhance the SEP predictive performance, we train an ensemble learning model that combines all the models trained on individual data modalities, leveraging the strengths of each data modality. Our proposed ensemble approach shows promising performance, achieving a recall of 0.80 and 0.75 in balanced and imbalanced settings, respectively, underscoring the effectiveness of multimodal data integration for robust SEP event prediction and enhanced forecasting capabilities.more » « less
-
Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed, they suffer from costly manual trials or unsatisfied compression performance. In addition, all of the existing works are not designed in a hardware-aware way, limiting the practical performance of the compressed models on real-world hardware platforms. To address these challenges, in this paper we propose HALOC, a hardware-aware automatic low-rank compression framework. By interpreting automatic rank selection from an architecture search perspective, we develop an end-to-end solution to determine the suitable layer-wise ranks in a differentiable and hardware-aware way. We further propose design principles and mitigation strategy to efficiently explore the rank space and reduce the potential interference problem.Experimental results on different datasets and hardware platforms demonstrate the effectiveness of our proposed approach. On CIFAR-10 dataset, HALOC enables 0.07% and 0.38% accuracy increase over the uncompressed ResNet-20 and VGG-16 models with 72.20% and 86.44% fewer FLOPs, respectively. On ImageNet dataset, HALOC achieves 0.9% higher top-1 accuracy than the original ResNet-18 model with 66.16% fewer FLOPs. HALOC also shows 0.66% higher top-1 accuracy increase than the state-of-the-art automatic low-rank compression solution with fewer computational and memory costs. In addition, HALOC demonstrates the practical speedups on different hardware platforms, verified by the measurement results on desktop GPU, embedded GPU and ASIC accelerator.more » « less
-
Abstract Text augmentation is an effective technique in alleviating overfitting in NLP tasks. In existing methods, text augmentation and downstream tasks are mostly performed separately. As a result, the augmented texts may not be optimal to train the downstream model. To address this problem, we propose a three-level optimization framework to perform text augmentation and the downstream task end-to- end. The augmentation model is trained in a way tailored to the downstream task. Our framework consists of three learning stages. A text summarization model is trained to perform data augmentation at the first stage. Each summarization example is associated with a weight to account for its domain difference with the text classification data. At the second stage, we use the model trained at the first stage to perform text augmentation and train a text classification model on the augmented texts. At the third stage, we evaluate the text classification model trained at the second stage and update weights of summarization examples by minimizing the validation loss. These three stages are performed end-to-end. We evaluate our method on several text classification datasets where the results demonstrate the effectiveness of our method. Code is available at https://github.com/Sai-Ashish/End-to-End-Text-Augmentation.more » « less
-
Contrastive learning learns input representation by pushing similar data together and pulling dissimilar data away, along with data augmentation and pretext task construction. It enhances the large model learning due to its ability to use a large amount of unlabeled data. It has been suc- cessfully applied to large language models, pre-trained image models, and multimodal models. In addition, contrastive learning learns a representation from modeling the explainable structure of the latent space, which has a broad application in scientific discovery and interpretable Artificial Intelligence (AI). The primary focus of this thesis is to explore contrastive learning from a data construction perspective in real-world problems to fill the gap between the principle of contrastive learning and its application. The challenges, such as sampling bias and data quality, will largely affect the representations learned by contrastive learning. This thesis analyzes the data construction chanlledges and limitations in 1) the negative sampling of knowledge graph embedding (KGE), 2) high-quliaty preference data labeling of Large Language Models (LLMs) alignment, 3) data augmentation in Non-linear dynamic system modeling, and 4) data properties in functions of mesange RNA (mRNA) sequence. To solve the challenges 1), a hardness and structure-based objective function was proposed by considering sampling bias in hard negative sampling. For challenge 2), the similarity of response embedding is used to evaluate the quality of preference pairs to mitigate the labeling error of humans when they face an ambiguous response pair. Chal- lenge 3) is solved by systematically considering the physical system and contrastive learning. A data augmentation strategy by partitioning the full sequence is used for learning the transition matrix in the latent linear space. Challenge 4) is common to see in the biological domain due to the high cost of lab experiments. Pre-trained model will advantage the limited dataset su- pervised learning by learning general features from domain knowledge. A contrastive learning based teacher-student framework is proposed for mRNA sequence learning by contrasting the unmasked sequence and the hard-masked sequence. By providing careful data construction or data sampling, contrastive learning will be boosted to solve tasks in reality. For the KGE, the novel contrastive loss function learns the boundary between negative samples and positive samples to improve the link prediction task in the knowl- edge graph; For the LLM alignment, in the same labeling cost, the selected dissimilar responses will improve the vanilla direct preference optimization (DPO) alignment; The data augmentation with contrastive loss play crucial role to learn more accuracy dynamic system, which explained by the learned the continiues eigenfunction; By considering the tearch-student framework with hard-masked strategy, the pre-trained model achieve the state-of-the-art result by fine-tuning on limited downstrame task data. Overall, this thesis provides a broad data-driven contrastive learning methodology to enhance representation learning in different domains. The methodology consists of a imprived objective function in the face of data bias, a better data selection reducing labeling error, and proper data augmentation for a particular application domain. This methodology improve the learning result compare to traditional method.more » « less
An official website of the United States government

