skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Heavy-Tailed Density Estimation
A novel statistical method is proposed and investigated for estimating a heavy tailed density under mildsmoothness assumptions. Statistical analyses of heavy-tailed distributions are susceptible to the problem ofsparse information in the tail of the distribution getting washed away by unrelated features of a hefty bulk.The proposed Bayesian method avoids this problem by incorporating smoothness and tail regularizationthrough a carefully specified semiparametric prior distribution, and is able to consistently estimate boththe density function and its tail index at near minimax optimal rates of contraction. A joint, likelihood drivenestimation of the bulk and the tail is shown to help improve uncertainty assessment in estimating the tailindex parameter and offer more accurate and reliable estimates of the high tail quantiles compared tothresholding methods. Supplementary materials for this article are available online.  more » « less
Award ID(s):
2014861
PAR ID:
10484531
Author(s) / Creator(s):
; ;
Publisher / Repository:
Taylor & Francis
Date Published:
Journal Name:
Journal of the American Statistical Association
ISSN:
0162-1459
Page Range / eLocation ID:
1 to 13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows. 
    more » « less
  2. We study high-dimensional signal recovery from non-linear measurements with design vectors having elliptically symmetric distribution. Special attention is devoted to the situation when the unknown signal belongs to a set of low statistical complexity, while both the measurements and the design vectors are heavy-tailed. We propose and analyze a new estimator that adapts to the structure of the problem, while being robust both to the possible model misspecification characterized by arbitrary non-linearity of the measurements as well as to data corruption modeled by the heavy-tailed distributions. Moreover, this estimator has low computational complexity. Our results are expressed in the form of exponential concentration inequalities for the error of the proposed estimator. On the technical side, our proofs rely on the generic chaining methods, and illustrate the power of this approach for statistical applications. Theory is supported by numerical experiments demonstrating that our estimator outperforms existing alternatives when data is heavy-tailed. 
    more » « less
  3. ABSTRACT In modern statistical applications, identifying critical features in high‐dimensional data is essential for scientific discoveries. Traditional best subset selection methods face computational challenges, while regularization approaches such as Lasso, SCAD and their variants often exhibit poor performance with ultrahigh‐dimensional data. Sure screening methods, widely used for dimensionality reduction, have been developed as popular alternatives, but few target heavy‐tailed characteristics in modern big data. This paper introduces a new sure screening method, based on robust distance correlation (‘RDC’), designed for heavy‐tailed data. The proposed method inherits the benefits of the original model‐free distance correlation‐based screening while robustly estimating distance correlation in the presence of heavy‐tailed data. We further develop an FDR control procedure by incorporating the Reflection via Data Splitting (REDS) method. Extensive simulations demonstrate the method's advantage over existing screening procedures under different scenarios of heavy‐tailedness. Its application to high‐dimensional heavy‐tailed RNA‐seq data from The Cancer Genome Atlas (TCGA) pancreatic cancer cohort showcases superior performance in identifying biologically meaningful genes predictive of MAPK1 protein expression critical to pancreatic cancer. 
    more » « less
  4. The bursty nature of network traffic makes it difficult to characterize accurately, and may give rise to heavy-tailed queue distributions within the network. Building on prior work in stochastic network calculus, we propose traffic burstiness bounds based on the class of phase-type distributions and develop an approach to estimate the parameter of such bounds using the expectation-maximization (EM) algorithm. By limiting the tail of the burstiness bound, our approach achieves a better fit of the phase-type distribution to the empirical data from heavy-tailed traffic. The proposed tail-limited phase-type burstiness bounds fall within the framework for stochastic network calculus based on generalized stochastically bounded burstiness. We demonstrate the effectiveness of the proposed methodology with a numerical example involving a heavy-tailed M/G/1 queue. 
    more » « less
  5. Optimal resource allocation in wireless systems still stands as a rather challenging task due to the inherent statistical characteristics of channel fading. On the one hand, minimax/outage-optimal policies are often overconservative and analytically intractable, despite advertising maximally reliable system performance. On the other hand, ergodic-optimal resource allocation policies are often susceptible to the statistical dispersion of heavy-tailed fading channels, leading to relatively frequent drastic performance drops. We investigate a new risk-aware formulation of the classical stochastic resource allocation problem for point-to-point power-constrained communication networks over fading channels with no cross-interference, by leveraging the Conditional Value-at-Risk (CV@R) as a coherent measure of risk. We rigorously derive closed-form expressions for the CV@R-optimal risk-aware resource allocation policy, as well as the optimal associated quantiles of the corresponding user rate functions by capitalizing on the underlying fading distribution, parameterized by dual variables. We then develop a purely dual tail waterfilling scheme, achieving significantly more rapid and assured convergence of dual variables, as compared with the primal-dual tail waterfilling algorithm, recently proposed in the literature. The effectiveness of the proposed scheme is also readily confirmed via detailed numerical simulations. 
    more » « less