High-dimensional semi-supervised learning: in search of optimal inference of the mean

Zhang, Yuqian; Bradic, Jelena

doi:10.1093/biomet/asab042

Citation Details

High-dimensional semi-supervised learning: in search of optimal inference of the mean

Summary A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-$$n$$ inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root $$n$$. This is achieved by a novel $$k$$-fold, cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-$$n$$ consistency, such as high-dimensional, nonparametric or semiparametric models. We apply our methods to estimating heterogeneous treatment effects. more »

Award ID(s):: 1712481

PAR ID:: 10345775

Author(s) / Creator(s):: Zhang, Yuqian; Bradic, Jelena

Date Published:: 2021-09-14

Journal Name:: Biometrika

Volume:: 109

Issue:: 2

ISSN:: 0006-3444

Page Range / eLocation ID:: 387 to 403

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1093/biomet/asab042

More Like this