skip to main content


Search for: All records

Award ID contains: 1711952

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Disease registries, surveillance data, and other datasets with extremely large sample sizes become increasingly available in providing population‐based information on disease incidence, survival probability, or other important public health characteristics. Such information can be leveraged in studies that collect detailed measurements but with smaller sample sizes. In contrast to recent proposals that formulate additional information as constraints in optimization problems, we develop a general framework to construct simple estimators that update the usual regression estimators with some functionals of data that incorporate the additional information. We consider general settings that incorporate nuisance parameters in the auxiliary information, non‐i.i.d. data such as those from case‐control studies, and semiparametric models with infinite‐dimensional parameters common in survival analysis. Details of several important data and sampling settings are provided with numerical examples.

     
    more » « less
  2. In this paper, we propose a novel method for matrix completion under general non- uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based on the proposed weighted empirical risk enjoys appealing theoretical guarantees. In particular, the proposed method achieves stronger guarantee than existing work in terms of the scaling with respect to the observation probabilities, under asymptotically heterogeneous missing settings (where entry-wise observation probabilities can be of different orders). These settings can be regarded as a better theoretical model of missing patterns with highly varying probabilities. We also provide a new minimax lower bound under a class of heterogeneous settings. Numerical experiments are also provided to demonstrate the effectiveness of the proposed method. 
    more » « less
  3. null (Ed.)
    Natural mediation effects are often of interest when the goal is to understand a causal mechanism. However, most existing methods and their identification assumptions preclude treatment-induced confounders often present in practice. To address this fundamental limitation, we provide a set of assumptions that identify the natural direct effect in the presence of treatment-induced confounders. Even when some of those assumptions are violated, the estimand still has an interventional direct effect interpretation. We derive the semiparametric efficiency bound for the estimand, which unlike usual expressions, contains conditional densities that are variational dependent. We consider a reparameterization and propose a quadruply robust estimator that remains consistent under four types of possible misspecification and is also locally semiparametric efficient. We use simulation studies to demonstrate the proposed method and study an application to the 2017 Natality data to investigate the effect of prenatal care on preterm birth mediated by preeclampsia with smoking status during pregnancy being a potential treatment-induced confounder. Supplementary materials for the article are available online. 
    more » « less