skip to main content


Title: Optimal Moving Average Estimation of Noisy Random Walks using Allan Variance-informed Window Length
Moving averages are widely used to estimate time-varying parameters, especially when the underlying dynamic model is unknown or uncertain. However, the selection of the optimal window length over which to evaluate the moving averages remains an unresolved issue in the field. In this paper, we demonstrate the use of Allan variance to identify the characteristic timescales of a noisy random walk from historical measurements. Further, we provide a closed-form, analytical result to show that the Allan variance-informed averaging window length is indeed the optimal averaging window length in the context of moving average estimation of noisy random walks. We complement the analytical proof with numerical results that support the solution, which is also reflected in the authors’ related works. This systematic methodology for selecting the optimal averaging window length using Allan variance is expected to widely benefit practitioners in a diverse array of fields that utilize the moving average estimation technique for noisy random walk signals.  more » « less
Award ID(s):
1932138
NSF-PAR ID:
10379088
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 American Control Conference (ACC), 2022
Page Range / eLocation ID:
1646 to 1651
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    decrease query response time with limited main memory and storage space, data reduction techniques that preserve data quality are needed. Existing data reduction techniques, however, are often computationally expensive and rely on heuristics for deciding how to split or reduce the original dataset. In this paper, we propose an effective granular data reduction technique for temporal databases, based on Allan Variance (AVAR). AVAR is used to systematically determine the temporal window length over which data remains relevant. The entire dataset to be reduced is then separated into granules with size equal to the AVAR-determined window length. Data reduction is achieved by generating aggregated information for each such granule. The proposed method is tested using a large database that contains temporal information for vehicular data. Then comparison experiments are conducted and the outstanding runtime performance is illustrated by comparing with three clustering-based data reduction methods. The performance results demonstrate that the proposed Allan Variance-based technique can efficiently generate reduced representation of the original data without losing data quality, while significantly reducing computation time. 
    more » « less
  2. Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency of the nested simulation. 
    more » « less
  3. Thomson, Robert (Ed.)
    Abstract Genome sequencing projects routinely generate haploid consensus sequences from diploid genomes, which are effectively chimeric sequences with the phase at heterozygous sites resolved at random. The impact of phasing errors on phylogenomic analyses under the multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer simulation to evaluate the performance of four phase-resolution strategies (the true phase resolution, the diploid analytical integration algorithm which averages over all phase resolutions, computational phase resolution using the program PHASE, and random resolution) on estimation of the species tree and evolutionary parameters in analysis of multilocus genomic data under the MSC model. We found that species tree estimation is robust to phasing errors when species divergences were much older than average coalescent times but may be affected by phasing errors when the species tree is shallow. Estimation of parameters under the MSC model with and without introgression is affected by phasing errors. In particular, random phase resolution causes serious overestimation of population sizes for modern species and biased estimation of cross-species introgression probability. In general, the impact of phasing errors is greater when the mutation rate is higher, the data include more samples per species, and the species tree is shallower with recent divergences. Use of phased sequences inferred by the PHASE program produced small biases in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution strategies have similar impacts on practical data analyses. We suggest that genome sequencing projects should produce unphased diploid genotype sequences if fully phased data are too challenging to generate, and avoid haploid consensus sequences, which have heterozygous sites phased at random. In case the analytical integration algorithm is computationally unfeasible, computational phasing prior to population genomic analyses is an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species tree.] 
    more » « less
  4. Abstract The extraordinary physical resolution afforded by the Event Horizon Telescope has opened a window onto the astrophysical phenomena unfolding on horizon scales in two known black holes, M87 * and Sgr A*. However, with this leap in resolution has come a new set of practical complications. Sgr A* exhibits intraday variability that violates the assumptions underlying Earth aperture synthesis, limiting traditional image reconstruction methods to short timescales and data sets with very sparse ( u , v ) coverage. We present a new set of tools to detect and mitigate this variability. We develop a data-driven, model-agnostic procedure to detect and characterize the spatial structure of intraday variability. This method is calibrated against a large set of mock data sets, producing an empirical estimator of the spatial power spectrum of the brightness fluctuations. We present a novel Bayesian noise modeling algorithm that simultaneously reconstructs an average image and statistical measure of the fluctuations about it using a parameterized form for the excess variance in the complex visibilities not otherwise explained by the statistical errors. These methods are validated using a variety of simulated data, including general relativistic magnetohydrodynamic simulations appropriate for Sgr A* and M87 * . We find that the reconstructed source structure and variability are robust to changes in the underlying image model. We apply these methods to the 2017 EHT observations of M87 * , finding evidence for variability across the EHT observing campaign. The variability mitigation strategies presented are widely applicable to very long baseline interferometry observations of variable sources generally, for which they provide a data-informed averaging procedure and natural characterization of inter-epoch image consistency. 
    more » « less
  5. Abstract

    Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyse them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective. Without any outcome modelling assumption, we evaluate these regression estimators and the associated robust standard errors from the design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency and illustrate the efficiency gain via both simulation studies and data analysis. The asymptotic analysis also reveals the efficiency-robustness trade-off by comparing the properties of various estimators using data at different levels with and without covariate adjustment. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic standard errors under the design-based perspective. Our theory holds even when the outcome models are misspecified, so it is model-assisted rather than model-based. We also extend the theory to a wider class of weighted average treatment effects.

     
    more » « less