skip to main content


Title: COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence

Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.

 
more » « less
Award ID(s):
2006633
NSF-PAR ID:
10358676
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
3328 to 3334
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A novel statistical method is proposed and investigated for estimating a heavy tailed density under mildsmoothness assumptions. Statistical analyses of heavy-tailed distributions are susceptible to the problem ofsparse information in the tail of the distribution getting washed away by unrelated features of a hefty bulk.The proposed Bayesian method avoids this problem by incorporating smoothness and tail regularizationthrough a carefully specified semiparametric prior distribution, and is able to consistently estimate boththe density function and its tail index at near minimax optimal rates of contraction. A joint, likelihood drivenestimation of the bulk and the tail is shown to help improve uncertainty assessment in estimating the tailindex parameter and offer more accurate and reliable estimates of the high tail quantiles compared tothresholding methods. Supplementary materials for this article are available online. 
    more » « less
  2. Abstract

    Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.

     
    more » « less
  3. Abstract

    In this study, we examine extremes of atmospheric water balance components through analyses of annual maxima of precipitable water and water vapor transport. Our analyses are grounded in Extreme Value Theory, using the Generalized Extreme Value (GEV) distribution as a platform for assessing water balance extremes. Annual maxima of atmospheric water balance terms are computed from North American Regional Reanalysis (NARR) fields for the 40‐year period extending from 1979 to 2018 on a grid of approximately 0.3‐degree resolution. We assess nonstationarities in the annual maximum time series through tests for monotonic trends. Estimates of the location, scale, and shape parameters for the Generalized Extreme Value (GEV) distribution are used to examine the spatial variability of water balance extremes. We focus on estimates of the GEV shape parameter, which dictates the “thickness” of the upper tail of the distribution. Estimates of the GEV shape parameter for precipitable water generally point to bounded distributions, but clusters of unbounded, thick‐tailed distributions are linked to exceptionally large record values of precipitable water associated with tropical cyclones in the Gulf of Mexico and Atlantic. Larger regions of “thick‐tailed” distributions are found for integrated water vapor transport (IVT). Non‐stationary GEV models are used to examine the impacts of trends on extremes of the atmospheric water balance. Mixtures of rare events associated with tropical cyclones and extratropical cyclones play a central role in analyses of water balance extremes.

     
    more » « less
  4. Abstract

    The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.

     
    more » « less
  5. null (Ed.)
    Spatial extremes are common for climate data as the observations are usually referenced by geographic locations and dependent when they are nearby. An important goal of extremes modeling is to estimate the T-year return level. Among the methods suitable for modeling spatial extremes, perhaps the simplest and fastest approach is the spatial generalized extreme value (GEV) distribution and the spatial generalized Pareto distribution (GPD) that assume marginal independence and only account for dependence through the parameters. Despite the simplicity, simulations have shown that return level estimation using the spatial GEV and spatial GPD still provides satisfactory results compared to max-stable processes, which are asymptotically justified models capable of representing spatial dependence among extremes. However, the linear functions used to model the spatially varying coefficients are restrictive and may be violated.We propose a flexible and fast approach based on the spatial GEV and spatial GPD by introducing fused lasso and fused ridge penalty for parameter regularization. This enables improved return level estimation for large spatial extremes compared to the existing methods. Supplemental files for this article are available online. 
    more » « less