skip to main content


Title: Estimating a Cosmological Mass Bias Parameter with Bootstrap Bandwidth Selection
Summary

We focus on selecting optimal bandwidths for non-parametric estimation of the two-point correlation function of a point pattern. We obtain these optimal bandwidths by using a bootstrap approach to select a bandwidth that minimizes the integrated squared error. The variance term is estimated by using a non-parametric spatial bootstrap, whereas the bias term is estimated with a plug-in approach using a pilot estimator of the two-point correlation function based on a parametric model. The choice of parametric model for the pilot estimator is very flexible. Depending on applications, parametric statistical point models, physical models or functional models can be used. We also explore the use of the procedure for selecting adaptive optimal bandwidths. We investigate the performance of the bandwidth selection procedure by using a simulation study. In our data example, we apply our method to a Sloan Digital Sky Survey galaxy cluster catalogue by using a pilot estimator based on the power law functional model in cosmology. The resulting non-parametric two-point correlation function estimate is then used to estimate a cosmological mass bias parameter that describes the relationship between the galaxy mass distribution and the underlying matter distribution.

 
more » « less
NSF-PAR ID:
10401985
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series C: Applied Statistics
Volume:
59
Issue:
5
ISSN:
0035-9254
Format(s):
Medium: X Size: p. 761-779
Size(s):
p. 761-779
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The traditional approach to obtain valid confidence intervals for non-parametric quantities is to select a smoothing parameter such that the bias of the estimator is negligible relative to its standard deviation. While this approach is apparently simple, it has two drawbacks: first, the question of optimal bandwidth selection is no longer well-defined, as it is not clear what ratio of bias to standard deviation should be considered negligible. Second, since the bandwidth choice necessarily deviates from the optimal (mean squares-minimizing) bandwidth, such a confidence interval is very inefficient. To address these issues, we construct valid confidence intervals that account for the presence of a non-negligible bias and thus make it possible to perform inference with optimal mean squared error minimizing bandwidths. The key difficulty in achieving this involves finding a strict, yet feasible, bound on the bias of a non-parametric estimator. It is well-known that it is not possible to consistently estimate the pointwise bias of an optimal non-parametric estimator (for otherwise, one could subtract it and obtain a faster convergence rate violating Stone’s bounds on the optimal convergence rates). Nevertheless, we find that, under minimal primitive assumptions, it is possible to consistently estimate an upper bound on the magnitude of the bias, which is sufficient to deliver a valid confidence interval whose length decreases at the optimal rate and which does not contradict Stone’s results. 
    more » « less
  2. Summary

    Panel count data arise when the number of recurrent events experienced by each subject is observed intermittently at discrete examination times. The examination time process can be informative about the underlying recurrent event process even after conditioning on covariates. We consider a semiparametric accelerated mean model for the recurrent event process and allow the two processes to be correlated through a shared frailty. The regression parameters have a simple marginal interpretation of modifying the time scale of the cumulative mean function of the event process. A novel estimation procedure for the regression parameters and the baseline rate function is proposed based on a conditioning technique. In contrast to existing methods, the proposed method is robust in the sense that it requires neither the strong Poisson-type assumption for the underlying recurrent event process nor a parametric assumption on the distribution of the unobserved frailty. Moreover, the distribution of the examination time process is left unspecified, allowing for arbitrary dependence between the two processes. Asymptotic consistency of the estimator is established, and the variance of the estimator is estimated by a model-based smoothed bootstrap procedure. Numerical studies demonstrated that the proposed point estimator and variance estimator perform well with practical sample sizes. The methods are applied to data from a skin cancer chemoprevention trial.

     
    more » « less
  3. ABSTRACT

    Galaxy clustering measurements are a key probe of the matter density field in the Universe. With the era of precision cosmology upon us, surveys rely on precise measurements of the clustering signal for meaningful cosmological analysis. However, the presence of systematic contaminants can bias the observed galaxy number density, and thereby bias the galaxy two-point statistics. As the statistical uncertainties get smaller, correcting for these systematic contaminants becomes increasingly important for unbiased cosmological analysis. We present and validate a new method for understanding and mitigating both additive and multiplicative systematics in galaxy clustering measurements (two-point function) by joint inference of contaminants in the galaxy overdensity field (one-point function) using a maximum-likelihood estimator (MLE). We test this methodology with Kilo-Degree Survey-like mock galaxy catalogues and synthetic systematic template maps. We estimate the cosmological impact of such mitigation by quantifying uncertainties and possible biases in the inferred relationship between the observed and the true galaxy clustering signal. Our method robustly corrects the clustering signal to the sub-per cent level and reduces numerous additive and multiplicative systematics from $1.5 \sigma$ to less than $0.1\sigma$ for the scenarios we tested. In addition, we provide an empirical approach to identifying the functional form (additive, multiplicative, or other) by which specific systematics contaminate the galaxy number density. Even though this approach is tested and geared towards systematics contaminating the galaxy number density, the methods can be extended to systematics mitigation for other two-point correlation measurements.

     
    more » « less
  4. Abstract

    The nonlinear responses of species to environmental variability can play an important role in the maintenance of ecological diversity. Nonetheless, many models use parametric nonlinear terms which pre‐determine the ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function in a functional single index model. Since the coefficient function and the link function are both unknown, the estimate is expressed as a nested optimization. We first estimate the coefficient function by minimizing squared error where the link function is estimated with a Nadaraya‐Watson estimator for each candidate coefficient function. The first and second derivatives of the link function are then estimated via local‐quadratic regression using the estimated coefficient function. In this paper, we derive a convergence rate for the curvature of the nonlinear response. In addition, we prove that the argument of the linear predictor can be estimated root‐nconsistently. However, practical implementation of the method requires solving a nonlinear optimization problem, and our results show that the estimates of the link function and the coefficient function are quite sensitive to the choices of starting values.

     
    more » « less
  5. We develop and analyse the HulC, an intuitive and general method for constructing confidence sets using the convex hull of estimates constructed from subsets of the data. Unlike classical methods which are based on estimating the (limiting) distribution of an estimator, the HulC is often simpler to use and effectively bypasses this step. In comparison to the bootstrap, the HulC requires fewer regularity conditions and succeeds in many examples where the bootstrap provably fails. Unlike sub-sampling, the HulC does not require knowledge of the rate of convergence of the estimators on which it is based. The validity of the HulC requires knowledge of the (asymptotic) median bias of the estimators. We further analyse a variant of our basic method, called the Adaptive HulC, which is fully data-driven and estimates the median bias using sub-sampling. We discuss these methods in the context of several challenging inferential problems which arise in parametric, semi-parametric, and non-parametric inference. Although our focus is on validity under weak regularity conditions, we also provide some general results on the width of the HulC confidence sets, showing that in many cases the HulC confidence sets have near-optimal width. 
    more » « less