Modern empirical work in regression discontinuity (RD) designs often employs local polynomial estimation and inference with a mean square error (MSE) optimal bandwidth choice. This bandwidth yields an MSEoptimal RD treatment effect estimator, but is by construction invalid for inference. Robust biascorrected (RBC) inference methods are valid when using the MSEoptimal bandwidth, but we show that they yield suboptimal confidence intervals in terms of coverage error. We establish valid coverage error expansions for RBC confidence interval estimators and use these results to propose new inferenceoptimal bandwidth choices for forming these intervals. We find that the standard MSEoptimal bandwidth for the RD point estimator is too large when the goal is to construct RBC confidence intervals with the smaller coverage error rate. We further optimize the constant terms behind the coverage error to derive new optimal choices for the auxiliary bandwidth required for RBC inference. Our expansions also establish that RBC inference yields higherorder refinements (relative to traditional undersmoothing) in the context of RD designs. Our main results cover sharp and sharp kink RD designs under conditional heteroskedasticity, and we discuss extensions to fuzzy and other RD designs, clustered sampling, and preintervention covariates adjustments. The theoretical findings are illustrated with a Monte Carlo experiment and an empirical application, and the main methodological results are available in R and Stata packages.
 Award ID(s):
 1659334
 NSFPAR ID:
 10188182
 Date Published:
 Journal Name:
 The Review of Economic Studies
 ISSN:
 00346527
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Summary 
Summary We focus on selecting optimal bandwidths for nonparametric estimation of the twopoint correlation function of a point pattern. We obtain these optimal bandwidths by using a bootstrap approach to select a bandwidth that minimizes the integrated squared error. The variance term is estimated by using a nonparametric spatial bootstrap, whereas the bias term is estimated with a plugin approach using a pilot estimator of the twopoint correlation function based on a parametric model. The choice of parametric model for the pilot estimator is very flexible. Depending on applications, parametric statistical point models, physical models or functional models can be used. We also explore the use of the procedure for selecting adaptive optimal bandwidths. We investigate the performance of the bandwidth selection procedure by using a simulation study. In our data example, we apply our method to a Sloan Digital Sky Survey galaxy cluster catalogue by using a pilot estimator based on the power law functional model in cosmology. The resulting nonparametric twopoint correlation function estimate is then used to estimate a cosmological mass bias parameter that describes the relationship between the galaxy mass distribution and the underlying matter distribution.

Location estimation is one of the most basic questions in parametric statistics. Suppose we have a known distribution density f , and we get n i.i.d. samples from f (x − μ) for some unknown shift μ. The task is to estimate μ to high accuracy with high probability. The maximum likelihood estimator (MLE) is known to be asymptotically optimal as n → ∞, but what is possible for finite n? In this paper, we give two location estimators that are optimal under different criteria: 1) an estimator that has minimaxoptimal estimation error subject to succeeding with probability 1 − ¶ and 2) a confidence interval estimator which, subject to its output interval containing μ with probability at least 1 − ¶, has the minimum expected squared interval width among all shiftinvariant estimators. The latter construction can be generalized to minimizing the expectation of any loss function on the interval width.more » « less

Summary We derive nonparametric confidence intervals for the eigenvalues of the Hessian at modes of a density estimate. This provides information about the strength and shape of modes and can also be used as a significance test. We use a datasplitting approach in which potential modes are identified by using the first half of the data and inference is done with the second half of the data. To obtain valid confidence sets for the eigenvalues, we use a bootstrap based on an elementary symmetric polynomial transformation. This leads to valid bootstrap confidence sets regardless of any multiplicities in the eigenvalues. We also suggest a new method for bandwidth selection, namely choosing the bandwidth to maximize the number of significant modes. We show by example that this method works well. Even when the true distribution is singular, and hence does not have a density (in which case crossvalidation chooses a zero bandwidth), our method chooses a reasonable bandwidth.

We consider the problem of constructing asymptotically valid confidence intervals for the change point in a highdimensional covariance shift setting. A novel estimator for the change point parameter is developed, and its asymptotic distribution under high dimen sional scaling obtained. We establish that the proposed estimator exhibits a sharp Op(ψ−2) rate of convergence, wherein ψ represents the jump size between model parameters before and after the change point. Further, the form of the asymptotic distributions under both a vanishing and a nonvanishing regime of the jump size are characterized. In the former case, it corresponds to the argmax of an asymmetric Brownian motion, while in the latter case to the argmax of an asymmetric random walk. We then obtain the relationship be tween these distributions, which allows construction of regime (vanishing vs nonvanishing) adaptive confidence intervals. Easy to implement algorithms for the proposed methodology are developed and their performance illustrated on synthetic and real data sets.more » « less