skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence
Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.  more » « less
Award ID(s):
2006633
PAR ID:
10358676
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
3328 to 3334
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A novel statistical method is proposed and investigated for estimating a heavy tailed density under mildsmoothness assumptions. Statistical analyses of heavy-tailed distributions are susceptible to the problem ofsparse information in the tail of the distribution getting washed away by unrelated features of a hefty bulk.The proposed Bayesian method avoids this problem by incorporating smoothness and tail regularizationthrough a carefully specified semiparametric prior distribution, and is able to consistently estimate boththe density function and its tail index at near minimax optimal rates of contraction. A joint, likelihood drivenestimation of the bulk and the tail is shown to help improve uncertainty assessment in estimating the tailindex parameter and offer more accurate and reliable estimates of the high tail quantiles compared tothresholding methods. Supplementary materials for this article are available online. 
    more » « less
  2. Abstract Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness. 
    more » « less
  3. Abstract The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions. 
    more » « less
  4. null (Ed.)
    Spatial extremes are common for climate data as the observations are usually referenced by geographic locations and dependent when they are nearby. An important goal of extremes modeling is to estimate the T-year return level. Among the methods suitable for modeling spatial extremes, perhaps the simplest and fastest approach is the spatial generalized extreme value (GEV) distribution and the spatial generalized Pareto distribution (GPD) that assume marginal independence and only account for dependence through the parameters. Despite the simplicity, simulations have shown that return level estimation using the spatial GEV and spatial GPD still provides satisfactory results compared to max-stable processes, which are asymptotically justified models capable of representing spatial dependence among extremes. However, the linear functions used to model the spatially varying coefficients are restrictive and may be violated.We propose a flexible and fast approach based on the spatial GEV and spatial GPD by introducing fused lasso and fused ridge penalty for parameter regularization. This enables improved return level estimation for large spatial extremes compared to the existing methods. Supplemental files for this article are available online. 
    more » « less
  5. Abstract Predicting the proportion of the water year a given stream will remain at or above various flow thresholds is critically important for making sound water management decisions. Flow duration curves (FDCs) succinctly capture this information using all data available over some historical period, while annual flow duration curves (AFDCs) instead use data from each individual water year. Analyzing the population of AFDCs, and in particular the tails of this distribution, can allow water managers to better prepare for years with extreme streamflow conditions. However, long time series of observations are necessary to capture interannual streamflow variations and are problematic to obtain in rapidly changing and poorly gauged catchments. By incorporating a process‐based model to construct AFDCs based on daily rainfall statistics and flow recession characteristics, the proposed approach is a first step toward addressing this challenge. Results indicate that prediction performance varies substantially across flow quantiles and that the current model fails to properly capture the interannual variability of low flows. Numerical analyses attributed these errors to nonlinearity in storage‐discharge relation, rather than cross‐scale streamflow correlations and non‐Poissonian rainfall, explaining the origin of commonly observed heavy‐tailed behavior in low flow quantiles. We present a case study on hydroelectric power generation, showing that faithfully capturing both interannual streamflow variability and recession nonlinearity has important implications for installation profitability. 
    more » « less