NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unraveling Block Maxima Forecasting Models with Counterfactual Explanation

https://doi.org/10.1145/3637528.3671923

Deng, Yue; Galib, Asadullah Hill; Tan, Pang-Ning; Luo, Lifeng (August 2024, ACM)

Full Text Available
SimEXT: Self-supervised Representation Learning for Extreme Values in Time Series

https://doi.org/10.1109/ICDM58522.2023.00119

Galib, Asadullah Hill; Tan, Pang-Ning; Luo, Lifeng (December 2023, IEEE)

Full Text Available
Long-term water temperature changes in Seneca Lake and their nexus to climate change and human activities

https://doi.org/10.1088/2515-7620/acfd20

Lan, Xin; Luo, Lifeng; Xu, Zhicheng; Qiu, Yuean; Yu, Xiang (November 2023, Environmental Research Communications)

Abstract While many freshwater lakes have witnessed a rapid increase in surface water temperatures, the trends in subsurface water temperatures are not well-understood. This study explored the long-term subsurface water temperature change and its connection to climate change and human activities in Seneca Lake. Utilizing linear regression and the Theil-Sen estimator, the study identified a significant monotonic temperature trend in the subsurface water. Principal component and contribution analyses revealed that climate changes, particularly air warming, were more critical in explaining water temperature patterns, and human activities such as land cover change could exacerbate the impact of climate change. Using remotely sensed surface water temperature data, the study found a significant positive correlation between thermal pollution and water temperatures in the northern region of the lake, and after incorporating control variables, the regression analysis suggested that the adverse effects of thermal pollution are primarily confined to the area adjacent to the power plant. This research can offer fresh insights into lake ecology improvement and management strategies.
more » « less
Full Text Available
Self-Recover: Forecasting Block Maxima in Time Series from Predictors with Disparate Temporal Coverage Using Self-Supervised Learning

https://doi.org/10.24963/ijcai.2023/414

Galib, Asadullah Hill; McDonald, Andrew; Tan, Pang-Ning; Luo, Lifeng (August 2023, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023))

Forecasting the block maxima of a future time window is a challenging task due to the difficulty in inferring the tail distribution of a target variable. As the historical observations alone may not be sufficient to train robust models to predict the block maxima, domain-driven process models are often available in many scientific domains to supplement the observation data and improve the forecast accuracy. Unfortunately, coupling the historical observations with process model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time window by employing self-supervised learning to address the varying temporal data coverage problem. Specifically Self-Recover uses a combination of contrastive and generative self-supervised learning schemes along with a denoising autoencoder to impute the missing values. The framework also combines representations of the historical observations with process model outputs via a residual learning approach and learns the generalized extreme value (GEV) distribution characterizing the block maxima values. This enables the framework to reliably estimate the block maxima of each time window along with its confidence interval. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover compared to other state-of-the-art forecasting methods.
more » « less
Full Text Available
COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence

https://doi.org/10.24963/ijcai.2022/462

McDonald, Andrew; Tan, Pang-Ning; Luo, Lifeng (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence)

Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.
more » « less
Full Text Available
DeepGPD: A Deep Learning Approach for Modeling Geospatio-Temporal Extreme Events

https://doi.org/10.1609/aaai.v36i4.20344

Wilson, Tyler; Tan, Pang-Ning; Luo, Lifeng (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Geospatio-temporal data are pervasive across numerous application domains.These rich datasets can be harnessed to predict extreme events such as disease outbreaks, flooding, crime spikes, etc.However, since the extreme events are rare, predicting them is a hard problem. Statistical methods based on extreme value theory provide a systematic way for modeling the distribution of extreme values. In particular, the generalized Pareto distribution (GPD) is useful for modeling the distribution of excess values above a certain threshold. However, applying such methods to large-scale geospatio-temporal data is a challenge due to the difficulty in capturing the complex spatial relationships between extreme events at multiple locations. This paper presents a deep learning framework for long-term prediction of the distribution of extreme values at different locations. We highlight its computational challenges and present a novel framework that combines convolutional neural networks with deep set and GPD. We demonstrate the effectiveness of our approach on a real-world dataset for modeling extreme climate events.
more » « less
Full Text Available
Beyond Point Prediction: Capturing Zero-Inflated & Heavy-Tailed Spatiotemporal Data with Deep Extreme Mixture Models

https://doi.org/10.1145/3534678.3539464

Wilson, Tyler; McDonald, Andrew; Galib, Asadullah Hill; Tan, Pang-Ning; Luo, Lifeng (August 2022, KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)
Zhang, Aidong; Rangwala, Huzefa (Ed.)
Zero-inflated, heavy-tailed spatiotemporal data is common across science and engineering, from climate science to meteorology and seismology. A central modeling objective in such settings is to forecast the intensity, frequency, and timing of extreme and non-extreme events; yet in the context of deep learning, this objective presents several key challenges. First, a deep learning framework applied to such data must unify a mixture of distributions characterizing the zero events, moderate events, and extreme events. Second, the framework must be capable of enforcing parameter constraints across each component of the mixture distribution. Finally, the framework must be flexible enough to accommodate for any changes in the threshold used to define an extreme event after training. To address these challenges, we propose Deep Extreme Mixture Model (DEMM), fusing a deep learning-based hurdle model with extreme value theory to enable point and distribution prediction of zero-inflated, heavy-tailed spatiotemporal variables. The framework enables users to dynamically set a threshold for defining extreme events at inference-time without the need for retraining. We present an extensive experimental analysis applying DEMM to precipitation forecasting, and observe significant improvements in point and distribution prediction. All code is available at https://github.com/andrewmcdonald27/DeepExtremeMixtureModel.
more » « less
Full Text Available
DeepExtrema: A Deep Learning Approach for Forecasting Block Maxima in Time Series Data

https://doi.org/10.24963/ijcai.2022/413

Galib, Asadullah Hill; McDonald, Andrew; Wilson, Tyler; Luo, Lifeng; Tan, Pang-Ning (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence)

Accurate forecasting of extreme values in time series is critical due to the significant impact of extreme events on human and natural systems. This paper presents DeepExtrema, a novel framework that combines a deep neural network (DNN) with generalized extreme value (GEV) distribution to forecast the block maximum value of a time series. Implementing such a network is a challenge as the framework must preserve the inter-dependent constraints among the GEV model parameters even when the DNN is initialized. We describe our approach to address this challenge and present an architecture that enables both conditional mean and quantile prediction of the block maxima. The extensive experiments performed on both real-world and synthetic data demonstrated the superiority of DeepExtrema compared to other baseline methods.
more » « less
Full Text Available
Automated Analysis of the US Drought Monitor Maps With Machine Learning and Multiple Drought Indicators

https://doi.org/10.3389/fdata.2021.750536

Hatami Bahman Beiglou, Pouyan; Luo, Lifeng; Tan, Pang-Ning; Pei, Lisi (October 2021, Frontiers in Big Data)

The US Drought Monitor (USDM) is a hallmark in real time drought monitoring and assessment as it was developed by multiple agencies to provide an accurate and timely assessment of drought conditions in the US on a weekly basis. The map is built based on multiple physical indicators as well as reported observations from local contributors before human analysts combine the information and produce the drought map using their best judgement. Since human subjectivity is included in the production of the USDM maps, it is not an entirely clear quantitative procedure for other entities to reproduce the maps. In this study, we developed a framework to automatically generate the maps through a machine learning approach by predicting the drought categories across the domain of study. A persistence model served as the baseline model for comparison in the framework. Three machine learning algorithms, logistic regression, random forests, and support vector machines, with four different groups of input data, which formed an overall of 12 different configurations, were used for the prediction of drought categories. Finally, all the configurations were evaluated against the baseline model to select the best performing option. The results showed that our proposed framework could reproduce the drought maps to a near-perfect level with the support vector machines algorithm and the group 4 data. The rest of the findings of this study can be highlighted as: 1) employing the past week drought data as a predictor in the models played an important role in achieving high prediction scores, 2) the nonlinear models, random forest, and support vector machines had a better overall performance compared to the logistic regression models, and 3) with borrowing the neighboring grid cells information, we could compensate the lack of training data in the grid cells with insufficient historical USDM data particularly for extreme and exceptional drought conditions.
more » « less
Full Text Available
Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds

https://doi.org/10.5194/hess-25-2997-2021

Pham, Leo Triet; Luo, Lifeng; Finley, Andrew (January 2021, Hydrology and Earth System Sciences)
null (Ed.)
Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.
more » « less
Full Text Available

« Prev Next »

Search for: All records