We develop a general nonparametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated nonparametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errorsinvariables problem, and using a deconvolution argument to access the distribution of the cluster mean. A Fourier deconvolution approach could be used if the distribution of the errorinvariables were known. In practice it is unknown, of course, but it can be estimated from repeated measurements, and in this way deconvolution can be achieved in an approximate sense. This argument might be interpreted as implying that large numbers of replicates are necessary for each cluster mean distribution, but that is not so; we avoid this requirement by incorporating statistical smoothing over values of nearby explanatory variables. Empirical rules are developed for the choice of smoothing parameter. Numerical simulations, and an application to real data, demonstrate small sample performance for this package of methodology. We also develop theory establishing statistical consistency.
 Award ID(s):
 1814840
 NSFPAR ID:
 10336403
 Date Published:
 Journal Name:
 ACM Transactions on Modeling and Computer Simulation
 Volume:
 31
 Issue:
 4
 ISSN:
 10493301
 Page Range / eLocation ID:
 1 to 36
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Summary 
In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional valueatrisk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSOX, originally proposed by Ahmed, Luedtke, SOng, and Xie in 2017 , for solving a CCP. We first show that the ALSOX resembles a bilevel optimization, where the upperlevel problem is to find the best objective function value and enforce the feasibility of a CCP for a given decision from the lowerlevel problem, and the lowerlevel problem is to minimize the expectation of constraint violations subject to the upper bound of the objective function value provided by the upperlevel problem. This interpretation motivates us to prove that when uncertain constraints are convex in the decision variables, ALSOX always outperforms the CVaR approximation. We further show (i) sufficient conditions under which ALSOX can recover an optimal solution to a CCP; (ii) an equivalent bilinear programming formulation of a CCP, inspiring us to enhance ALSOX with a convergent alternating minimization method (ALSOX+); and (iii) an extension of ALSOX and ALSOX+ to distributionally robust chance constrained programs (DRCCPs) under the ∞−Wasserstein ambiguity set. Our numerical study demonstrates the effectiveness of the proposed methods.more » « less

This paper provides a general derivative identity for the conditional mean estimator of an arbitrary vector signal in Gaussian noise with an arbitrary covariance matrix. This new identity is used to recover and generalize many known identities in the literature and derive some new identities. For example, a new identity is discovered, which shows that an arbitrary higherorder conditional moment is completely determined by the first conditional moment.Several applications of the identities are shown. For instance, by using one of the identities, a simple proof of the uniqueness of the conditional mean estimator as a function of the distribution of the signal is shown. Moreover, one of the identities is used to extend the notion of empirical Bayes to higherorder conditional moments. Specifically, based on a random sample of noisy observations, a consistent estimator for a conditional expectation of any order is derived.more » « less

Abstract Epidemiologic studies of the short‐term effects of ambient particulate matter (PM) on the risk of acute cardiovascular or cerebrovascular events often use data from administrative databases in which only the date of hospitalization is known. A common study design for analyzing such data is the case‐crossover design, in which exposure at a time when a patient experiences an event is compared to exposure at times when the patient did not experience an event within a case‐control paradigm. However, the time of true event onset may precede hospitalization by hours or days, which can yield attenuated effect estimates. In this article, we consider a marginal likelihood estimator, a regression calibration estimator, and a conditional score estimator, as well as parametric bootstrap versions of each, to correct for this bias. All considered approaches require validation data on the distribution of the delay times. We compare the performance of the approaches in realistic scenarios via simulation, and apply the methods to analyze data from a Boston‐area study of the association between ambient air pollution and acute stroke onset. Based on both simulation and the case study, we conclude that a two‐stage regression calibration estimator with a parametric bootstrap bias correction is an effective method for correcting bias in health effect estimates arising from delayed onset in a case‐crossover study.

Abstract Unfolding is an illposed inverse problem in particle physics aiming to infer a true particlelevel spectrum from smeared detectorlevel data. For computational and practical reasons, these spaces are typically discretized using histograms, and the smearing is modeled through a response matrix corresponding to a discretized smearing kernel of the particle detector. This response matrix depends on the unknown shape of the true spectrum, leading to a fundamental systematic uncertainty in the unfolding problem. To handle the illposed nature of the problem, common approaches regularize the problem either directly via methods such as Tikhonov regularization, or implicitly by using widebins in the true space that match the resolution of the detector. Unfortunately, both of these methods lead to a nontrivial bias in the unfolded estimator, thereby hampering frequentist coverage guarantees for confidence intervals constructed from these methods. We propose two new approaches to addressing the bias in the widebin setting through methods called Oneatatime Strict Bounds (OSB) and PriorOptimized (PO) intervals. The OSB intervals are a binwise modification of an existing guaranteedcoverage procedure, while the PO intervals are based on a decisiontheoretic view of the problem. Importantly, both approaches provide wellcalibrated frequentist confidence intervals even in constrained and rankdeficient settings. These methods are built upon a more general answer to the widebin bias problem, involving unfolding with fine bins first, followed by constructing confidence intervals for linear functionals of the finebin counts. We test and compare these methods to other available methodologies in a widebin deconvolution example and a realistic particle physics simulation of unfolding a steeply falling particle spectrum.more » « less