skip to main content


Title: Partial Identifiability in Discrete Data with Measurement Error
When data contains measurement errors, it is necessary to make modeling assumptions relating the error-prone measurements to the unobserved true values. Work on measurement error has largely focused on models that fully identify the parameter of interest. As a result, many practically useful models that result in bounds on the target parameter -- known as partial identification -- have been neglected. In this work, we present a method for partial identification in a class of measurement error models involving discrete variables. We focus on models that impose linear constraints on the tar- get parameter, allowing us to compute partial identification bounds using off-the-shelf LP solvers. We show how several common measurement error assumptions can be composed with an extended class of instrumental variable-type models to create such linear constraint sets. We further show how this approach can be used to bound causal parameters, such as the average treatment effect, when treatment or outcome variables are measured with error. Using data from the Oregon Health Insurance Experiment, we apply this method to estimate bounds on the effect Medicaid enrollment has on depression when depression is measured with error.  more » « less
Award ID(s):
1942239
NSF-PAR ID:
10329246
Author(s) / Creator(s):
; ; ;
Editor(s):
Cassio de Campos; Marloes H. Maathuis
Date Published:
Journal Name:
Proceedings of the Thirty Seventh Conference on Uncertainty in Artificial Intelligence
Volume:
161
Page Range / eLocation ID:
1798-1808
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Many clinical studies on non-mortality outcomes such as quality of life suffer from the problem that the non-mortality outcome can be censored by death, i.e. the non-mortality outcome cannot be measured if the subject dies before the time of measurement. To address the problem that this censoring by death is informative, it is of interest to consider the average effect of the treatment on the non-mortality outcome among subjects whose measurement would not be censored under either treatment or control, which is called the survivor average causal effect (SACE). The SACE is not point identified under usual assumptions but bounds can be constructed. The previous literature on bounding the SACE uses only the survival information before the measurement of the non-mortality outcome. However, survival information after the measurement of the non-mortality outcome could also be informative. For randomized trials, we propose a set of ranked average score assumptions that make use of survival information before and after the measurement of the non-mortality outcome which are plausibly satisfied in many studies and we develop a two-step linear programming approach to obtain the closed form for bounds on the SACE under our assumptions. We also extend our method to randomized trials with non-compliance or observational studies with a valid instrumental variable to obtain bounds on the complier SACE which is presented in on-line supplementary material. We apply our method to a randomized trial of the effect of mechanical ventilation with lower tidal volume versus traditional tidal volume for acute lung injury patients. Our bounds on the SACE are much shorter than the bounds that are obtained by using only the survival information before the measurement of the non-mortality outcome.

     
    more » « less
  2. Abstract

    Structured demographic models are among the most common and useful tools in population biology. However, the introduction of integral projection models (IPMs) has caused a profound shift in the way many demographic models are conceptualized. Some researchers have argued that IPMs, by explicitly representing demographic processes as continuous functions of state variables such as size, are more statistically efficient, biologically realistic, and accurate than classic matrix projection models, calling into question the usefulness of the many studies based on matrix models. Here, we evaluate how IPMs and matrix models differ, as well as the extent to which these differences matter for estimation of key model outputs, including population growth rates, sensitivity patterns, and life spans. First, we detail the steps in constructing and using each type of model. Second, we present a review of published demographic models, concentrating on size‐based studies, which shows significant overlap in the way IPMs and matrix models are constructed and analyzed. Third, to assess the impact of various modeling decisions on demographic predictions, we ran a series of simulations based on size‐based demographic data sets for five biologically diverse species. We found little evidence that discrete vital rate estimation is less accurate than continuous functions across a wide range of sample sizes or size classes (equivalently bin numbers or mesh points). Most model outputs quickly converged with modest class numbers (≥10), regardless of most other modeling decisions. Another surprising result was that the most commonly used method to discretize growth rates for IPM analyses can introduce substantial error into model outputs. Finally, we show that empirical sample sizes generally matter more than modeling approach for the accuracy of demographic outputs. Based on these results, we provide specific recommendations to those constructing and evaluating structured population models. Both our literature review and simulations question the treatment of IPMs as a clearly distinct modeling approach or one that is inherently more accurate than classic matrix models. Importantly, this suggests that matrix models, representing the vast majority of past demographic analyses available for comparative and conservation work, continue to be useful and important sources of demographic information.

     
    more » « less
  3. We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise. 
    more » « less
  4. We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method—coupled with a novel, non-standard matrix estimation error metric—to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise. 
    more » « less
  5. Summary

    Covariate-adaptive randomization is popular in clinical trials with sequentially arrived patients for balancing treatment assignments across prognostic factors that may have influence on the response. However, existing theory on tests for the treatment effect under covariate-adaptive randomization is limited to tests under linear or generalized linear models, although the covariate-adaptive randomization method has been used in survival analysis for a long time. Often, practitioners will simply adopt a conventional test to compare two treatments, which is controversial since tests derived under simple randomization may not be valid in terms of type I error under other randomization schemes. We derive the asymptotic distribution of the partial likelihood score function under covariate-adaptive randomization and a working model that is subject to possible model misspecification. Using this general result, we prove that the partial likelihood score test that is robust against model misspecification under simple randomization is no longer robust but conservative under covariate-adaptive randomization. We also show that the unstratified log-rank test is conservative and the stratified log-rank test remains valid under covariate-adaptive randomization. We propose a modification to variance estimation in the partial likelihood score test, which leads to a score test that is valid and robust against arbitrary model misspecification under a large family of covariate-adaptive randomization schemes including simple randomization. Furthermore, we show that the modified partial likelihood score test derived under a correctly specified model is more powerful than log-rank-type tests in terms of Pitman’s asymptotic relative efficiency. Simulation studies about the type I error and power of various tests are presented under several popular randomization schemes.

     
    more » « less