skip to main content


Title: COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence

Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.

 
more » « less
Award ID(s):
2006633
NSF-PAR ID:
10358676
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
3328 to 3334
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.

     
    more » « less
  2. Abstract

    Predicting the proportion of the water year a given stream will remain at or above various flow thresholds is critically important for making sound water management decisions. Flow duration curves (FDCs) succinctly capture this information using all data available over some historical period, while annual flow duration curves (AFDCs) instead use data from each individual water year. Analyzing the population of AFDCs, and in particular the tails of this distribution, can allow water managers to better prepare for years with extreme streamflow conditions. However, long time series of observations are necessary to capture interannual streamflow variations and are problematic to obtain in rapidly changing and poorly gauged catchments. By incorporating a process‐based model to construct AFDCs based on daily rainfall statistics and flow recession characteristics, the proposed approach is a first step toward addressing this challenge. Results indicate that prediction performance varies substantially across flow quantiles and that the current model fails to properly capture the interannual variability of low flows. Numerical analyses attributed these errors to nonlinearity in storage‐discharge relation, rather than cross‐scale streamflow correlations and non‐Poissonian rainfall, explaining the origin of commonly observed heavy‐tailed behavior in low flow quantiles. We present a case study on hydroelectric power generation, showing that faithfully capturing both interannual streamflow variability and recession nonlinearity has important implications for installation profitability.

     
    more » « less
  3. A novel statistical method is proposed and investigated for estimating a heavy tailed density under mildsmoothness assumptions. Statistical analyses of heavy-tailed distributions are susceptible to the problem ofsparse information in the tail of the distribution getting washed away by unrelated features of a hefty bulk.The proposed Bayesian method avoids this problem by incorporating smoothness and tail regularizationthrough a carefully specified semiparametric prior distribution, and is able to consistently estimate boththe density function and its tail index at near minimax optimal rates of contraction. A joint, likelihood drivenestimation of the bulk and the tail is shown to help improve uncertainty assessment in estimating the tailindex parameter and offer more accurate and reliable estimates of the high tail quantiles compared tothresholding methods. Supplementary materials for this article are available online. 
    more » « less
  4. Abstract

    The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.

     
    more » « less
  5. Abstract

    Coherent regions exhibiting non‐Gaussian 2‐m temperature distribution tails are present across the globe, indicating changes in extreme temperatures under future warming may manifest in more complex ways than were the underlying distributions symmetric about the mean. To further the understanding of physical processes that govern temperature distribution tail shape, this work utilizes a back‐trajectory model to diagnose mechanisms for extreme daily mean temperature development at select extratropical locations exhibiting non‐Gaussian tails. Although characteristics such as direction, distance, and temperature evolution vary among back‐trajectories associated with extreme temperature days, results reveal principal pathways for air parcel propagation associated with preferred patterns in large‐scale circulation. A relatively persistent synoptic setup leads to thermal advection, which interacts with local geographic features to produce a shorter‐ or longer‐than‐Gaussian tail. Significant relationships with recurrent modes of atmospheric and sea surface temperature variability further suggest the influence of teleconnection wave patterns and ocean temperatures on extreme daily temperature occurrence over land, though local, smaller‐scale processes are also important. Air parcels transporting extreme temperatures at short‐tailed locations often originate in marine environments, constraining the magnitude of the temperature extreme, while locations exhibiting long cold tails require rare meteorological conditions to transport the coldest air from poleward source regions often partially blocked by topography or downstream of the prevailing wind. Processes governing longer‐than‐Gaussian warm tails at locations examined are more subtle and not as obviously dominated by horizontal advection. Results provide added insight into our understanding of temperature extremes and how they may change in the future at regional scales.

     
    more » « less