We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal" and "what are the other possible strategies to deal with heavy tailed distributions warrant further studies." We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the intrinsic dimension'' associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.
more »
« less
One-Bit Normalized Scatter Matrix Estimation For Complex Elliptically Symmetric Distributions
One-bit quantization has attracted attention in massive MIMO, radar, and array processing, due to its simplicity, low cost, and capability of parameter estimation. Specifically, the shape of the covariance of the unquantized data can be estimated from the arcsine law and onebit data, if the unquantized data is Gaussian. However, in practice, the Gaussian assumption is not satisfied due to outliers. It is known from the literature that outliers can be modeled by complex elliptically symmetric (CES) distributions with heavy tails. This paper shows that the arcsine law remains applicable to CES distributions. Therefore, the normalized scatter matrix of the unquantized data can be readily estimated from one-bit samples derived from CES distributions. The proposed estimator is not only computationally fast but also robust to CES distributions with heavy tails. These attributes will be demonstrated through numerical examples, in terms of computational time and the estimation error. An application in DOA estimation with MUSIC spectrum is also presented.
more »
« less
- Award ID(s):
- 1712633
- PAR ID:
- 10275633
- Date Published:
- Journal Name:
- Proc. IEEE Int. Conf. Acoust. Speech, and Signal Proc
- Page Range / eLocation ID:
- 9130 to 9134
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)One-bit quantization has attracted considerable attention in signal processing for communications and sensing. The arcsine law is a useful relation often used to estimate the normalized covariance matrix of zero-mean stationary input signals when they are sampled by one-bit analog-to-digital converters (ADCs)---practically comparing the signals with a given threshold level. This relation, however, only considers a zero threshold which can cause a remarkable information loss. For the first time in the literature, this paper introduces an approach to extending the arcsine law to the case where one-bit ADCs apply time-varying thresholds. In particular, the proposed method is shown to accurately recover the variance and autocorrelation of the stationary signals of interest.more » « less
-
Trust in data collected by and passing through Internt of Things (IoT) networks is paramount. The quality of decisions made based on this collected data is highly dependent upon the accuracy of the data. Currently, most trust assessment methodologies assume that collected data follows a stationary Gaussian distribution. Often, a trust score is estimated based upon the deviation from this distribution. However, the underlying state of a system monitored by an IoT network can change over time, and the data collected from the network may not consistently follow a Gaussian distribution. Further, faults that occur within the estimated Gaussian distribution may go undetected. In this study, we present a model-based trust estimation system that allows for concept drift or distributions that can change over time. The presented methodology uses data-driven models to estimate the value of the data produced by a sensor using the data produced by the other sensors in the network. We assume that an untrustworthy piece of data falls in the tails of the residual distribution, and we use this concept to assign a trust score. The method is evaluated on a smart home data set consisting of temperature, humidity, and energy sensors.more » « less
-
It is well known that the microbiome data are ridden with outliers and have heavy distribution tails, but the impact of outliers and heavy-tailedness has yet to be examined systematically. This paper investigates the impact of outliers and heavy-tailedness on differential abundance analysis (DAA) using the linear models for the differential abundance analysis (LinDA) method and proposes effective strategies to mitigate their influence. The presence of outliers and heavy-tailedness can significantly decrease the power of LinDA. We investigate various techniques to address outliers and heavy-tailedness, including generalizing LinDA into a more flexible framework that allows for the use of robust regression and winsorizing the data before applying LinDA. Our extensive numerical experiments and real-data analyses demonstrate that robust Huber regression has overall the best performance in addressing outliers and heavy-tailedness.more » « less
-
The eukaryotic cell's cytoskeleton is a prototypical example of an active material: objects embedded within it are driven by molecular motors acting on the cytoskeleton, leading to anomalous diffusive behavior. Experiments tracking the behavior of cell-attached objects have observed anomalous diffusion with a distribution of displacements that is non-Gaussian, with heavy tails. This has been attributed to “cytoquakes” or other spatially extended collective effects. We show, using simulations and analytical theory, that a simple continuum active gel model driven by fluctuating force dipoles naturally creates heavy power-law tails in cytoskeletal displacements. We predict that this power law exponent should depend on the geometry and dimensionality of where force dipoles are distributed through the cell; we find qualitatively different results for force dipoles in a 3D cytoskeleton and a quasi-two-dimensional cortex. We then discuss potential applications of this model both in cells and in synthetic active gels.more » « less
An official website of the United States government

