NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ARK: Robust knockoffs inference with coupling

https://doi.org/10.1214/24-AOS2480

Fan, Yingying; Gao, Lan; Lv, Jinchi (April 2025, The Annals of Statistics)

We investigate the robustness of the model-X knockoffs framework with respect to the misspecified or estimated feature distribution. We achieve such a goal by theoretically studying the feature selection performance of a practically implemented knockoffs algorithm, which we name as the approximate knockoffs (ARK) procedure, under the measures of the false discovery rate (FDR) and k-familywise error rate (k-FWER). The approximate knockoffs procedure differs from the model-X knockoffs procedure only in that the former uses the misspecified or estimated feature distribution. A key technique in our theoretical analyses is to couple the approximate knockoffs procedure with the model-X knockoffs procedure so that random variables in these two procedures can be close in realizations. We prove that if such coupled model-X knockoffs procedure exists, the approximate knockoffs procedure can achieve the asymptotic FDR or k-FWER control at the target level. We showcase three specific constructions of such coupled model-X knockoff variables, verifying their existence and justifying the robustness of the model-X knockoffs framework. Additionally, we formally connect our concept of knockoff variable coupling to a type of Wasserstein distance.
more » « less
Free, publicly-accessible full text available April 1, 2026
Data science in economics and finance: Introduction

https://doi.org/10.1016/j.jeconom.2023.105627

Cattaneo, Matias D.; Fan, Yingying; Li, Runze; Song, Rui (February 2024, Journal of Econometrics)

Full Text Available
A Complementary Pseudo-Resistor with Leakage Current Self-Compensation for Biopotential Amplifiers

https://doi.org/10.1109/BioCAS58349.2023.10389136

Topalli, Gerald; Xie, Chong; Fan, Yingying; Luan, Lan; Yin, Rongkang; Chi, Taiyun (October 2023, IEEE)

Full Text Available
Universal rank inference via residual subsampling with application to large networks

https://doi.org/10.1214/23-AOS2282

Han, Xiao; Yang, Qing; Fan, Yingying (June 2023, The Annals of Statistics)

Full Text Available
Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

https://doi.org/10.1080/01621459.2022.2115375

Demirkaya, Emre; Fan, Yingying; Gao, Lan; Lv, Jinchi; Vossler, Patrick; Wang, Jingbo (October 2022, Journal of the American Statistical Association)

The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.
more » « less
Full Text Available
SIMPLE: Statistical inference on membership profiles in large networks

https://doi.org/10.1111/rssb.12505

Fan, Jianqing; Fan, Yingying; Han, Xiao; Lv, Jinchi (April 2022, Journal of the Royal Statistical Society: Series B (Statistical Methodology))

Full Text Available
Asymptotic distributions of high-dimensional distance correlation inference

https://doi.org/10.1214/20-aos2024

Gao, Lan; Fan, Yingying; Lv, Jinchi; Shao, Qi-Man (August 2021, The Annals of Statistics)

Full Text Available
Asymptotic Theory of Eigenvectors for Random Matrices With Diverging Spikes

https://doi.org/10.1080/01621459.2020.1840990

Fan, Jianqing; Fan, Yingying; Han, Xiao; Lv, Jinchi (January 2020, Journal of the American Statistical Association)

Full Text Available
Tuning-Free Heterogeneous Inference in Massive Networks

https://doi.org/10.1080/01621459.2018.1537920

Ren, Zhao; Kang, Yongjian; Fan, Yingying; Lv, Jinchi (October 2019, Journal of the American Statistical Association)

Full Text Available
DeepLINK: Deep learning inference using knockoffs with applications to genomics

https://doi.org/10.1073/pnas.2104683118

Zhu, Zifan; Fan, Yingying; Kong, Yinfei; Lv, Jinchi; Sun, Fengzhu (September 2021, Proceedings of the National Academy of Sciences)

Significance Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications.
more » « less

Search for: All records