We develop a general non-parametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated non-parametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errors-in-variables problem, and using a deconvolution argument to access the distribution of the cluster mean. A Fourier deconvolution approach could be used if the distribution of the error-in-variables were known. In practice it is unknown, of course, but it can be estimated from repeated measurements, and in this way deconvolution can be achieved in an approximate sense. This argument might be interpreted as implying that large numbers of replicates are necessary for each cluster mean distribution, but that is not so; we avoid this requirement by incorporating statistical smoothing over values of nearby explanatory variables. Empirical rules are developed for the choice of smoothing parameter. Numerical simulations, and an application to real data, demonstrate small sample performance for this package of methodology. We also develop theory establishing statistical consistency.
- Award ID(s):
- NSF-PAR ID:
- Date Published:
- Journal Name:
- ACM Transactions on Modeling and Computer Simulation
- Page Range / eLocation ID:
- 1 to 36
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional value-at-risk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSO-X, originally proposed by Ahmed, Luedtke, SOng, and Xie in 2017 , for solving a CCP. We first show that the ALSO-X resembles a bilevel optimization, where the upper-level problem is to find the best objective function value and enforce the feasibility of a CCP for a given decision from the lower-level problem, and the lower-level problem is to minimize the expectation of constraint violations subject to the upper bound of the objective function value provided by the upper-level problem. This interpretation motivates us to prove that when uncertain constraints are convex in the decision variables, ALSO-X always outperforms the CVaR approximation. We further show (i) sufficient conditions under which ALSO-X can recover an optimal solution to a CCP; (ii) an equivalent bilinear programming formulation of a CCP, inspiring us to enhance ALSO-X with a convergent alternating minimization method (ALSO-X+); and (iii) an extension of ALSO-X and ALSO-X+ to distributionally robust chance constrained programs (DRCCPs) under the ∞−Wasserstein ambiguity set. Our numerical study demonstrates the effectiveness of the proposed methods.more » « less
This paper provides a general derivative identity for the conditional mean estimator of an arbitrary vector signal in Gaussian noise with an arbitrary covariance matrix. This new identity is used to recover and generalize many known identities in the literature and derive some new identities. For example, a new identity is discovered, which shows that an arbitrary higher-order conditional moment is completely determined by the first conditional moment.Several applications of the identities are shown. For instance, by using one of the identities, a simple proof of the uniqueness of the conditional mean estimator as a function of the distribution of the signal is shown. Moreover, one of the identities is used to extend the notion of empirical Bayes to higher-order conditional moments. Specifically, based on a random sample of noisy observations, a consistent estimator for a conditional expectation of any order is derived.more » « less
Epidemiologic studies of the short‐term effects of ambient particulate matter (PM) on the risk of acute cardiovascular or cerebrovascular events often use data from administrative databases in which only the date of hospitalization is known. A common study design for analyzing such data is the case‐crossover design, in which exposure at a time when a patient experiences an event is compared to exposure at times when the patient did not experience an event within a case‐control paradigm. However, the time of true event onset may precede hospitalization by hours or days, which can yield attenuated effect estimates. In this article, we consider a marginal likelihood estimator, a regression calibration estimator, and a conditional score estimator, as well as parametric bootstrap versions of each, to correct for this bias. All considered approaches require validation data on the distribution of the delay times. We compare the performance of the approaches in realistic scenarios via simulation, and apply the methods to analyze data from a Boston‐area study of the association between ambient air pollution and acute stroke onset. Based on both simulation and the case study, we conclude that a two‐stage regression calibration estimator with a parametric bootstrap bias correction is an effective method for correcting bias in health effect estimates arising from delayed onset in a case‐crossover study.
Abstract Unfolding is an ill-posed inverse problem in particle physics aiming to infer a true particle-level spectrum from smeared detector-level data. For computational and practical reasons, these spaces are typically discretized using histograms, and the smearing is modeled through a response matrix corresponding to a discretized smearing kernel of the particle detector. This response matrix depends on the unknown shape of the true spectrum, leading to a fundamental systematic uncertainty in the unfolding problem. To handle the ill-posed nature of the problem, common approaches regularize the problem either directly via methods such as Tikhonov regularization, or implicitly by using wide-bins in the true space that match the resolution of the detector. Unfortunately, both of these methods lead to a non-trivial bias in the unfolded estimator, thereby hampering frequentist coverage guarantees for confidence intervals constructed from these methods. We propose two new approaches to addressing the bias in the wide-bin setting through methods called One-at-a-time Strict Bounds (OSB) and Prior-Optimized (PO) intervals. The OSB intervals are a bin-wise modification of an existing guaranteed-coverage procedure, while the PO intervals are based on a decision-theoretic view of the problem. Importantly, both approaches provide well-calibrated frequentist confidence intervals even in constrained and rank-deficient settings. These methods are built upon a more general answer to the wide-bin bias problem, involving unfolding with fine bins first, followed by constructing confidence intervals for linear functionals of the fine-bin counts. We test and compare these methods to other available methodologies in a wide-bin deconvolution example and a realistic particle physics simulation of unfolding a steeply falling particle spectrum.more » « less