NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DeepLINK: Deep learning inference using knockoffs with applications to genomics

https://doi.org/10.1073/pnas.2104683118

Zhu, Zifan; Fan, Yingying; Kong, Yinfei; Lv, Jinchi; Sun, Fengzhu (September 2021, Proceedings of the National Academy of Sciences)

Significance Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications.
more » « less
SIMPLE: Statistical inference on membership profiles in large networks

https://doi.org/10.1111/rssb.12505

Fan, Jianqing; Fan, Yingying; Han, Xiao; Lv, Jinchi (April 2022, Journal of the Royal Statistical Society: Series B (Statistical Methodology))

Full Text Available
Not Registered? Please Sign Up First: A Randomized Field Experiment on the Ex Ante Registration Request

https://doi.org/10.1287/isre.2021.0999

Huang, Ni; Mojumder, Probal; Sun, Tianshu; Lv, Jinchi; Golden, Joseph M. (September 2021, Information Systems Research)

Online commerce websites often request users to register in the online shopping process. Recognizing the challenges of user registration, many websites opt to delay their registration request until the end of the conversion funnel (i.e., ex post registration request). Our study explores an alternative approach by asking users to register with the website at the beginning of their shopping journey (i.e., ex ante registration request). Guided by a stylized analytical model, we conducted a large-scale randomized field experiment in partnership with an online retailer in the United States to examine how the ex ante request affects users’ registration decisions, short-term customer conversions, and long-term purchase behaviors. Specifically, we randomly assigned the new users in the website’s incoming traffic to one of two experimental groups: one with an ex ante registration request preceding the ex post request (treatment) and the other with only an ex post registration request (control). Our results show that the ex ante request leads to an increased probability of user registration; that is, the users in the treatment group, on average, are 58.08% relatively more likely to register with the website than those in the control group. Furthermore, the ex ante request leads to significant increases in customer purchases in the long run. Based on our estimation of the local average treatment effects, the ex ante registered users are 10.89% relatively more likely to make a purchase, place a 16.76% relatively greater number of orders, and generate 13.22% relatively higher total revenue for the firm in the long run. Finally, the ex ante request also does not impact customer conversion in the short-term. Further investigation into the long-term and short-term effects provides suggestive evidence on several potential mechanisms, such as firm-initiated interaction and screening of low-interest users. Our study provides managerial implications to the e-commerce websites on customer acquisition and contributes to the research on IT artifact design.
more » « less
Full Text Available
Asymptotic distributions of high-dimensional distance correlation inference

https://doi.org/10.1214/20-aos2024

Gao, Lan; Fan, Yingying; Lv, Jinchi; Shao, Qi-Man (August 2021, The Annals of Statistics)

Full Text Available
Large-scale model selection in misspecified generalized linear models

https://doi.org/10.1093/biomet/asab005

Demirkaya, Emre; Feng, Yang; Basu, Pallavi; Lv, Jinchi (January 2021, Biometrika)

Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv & Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors.
more » « less
Full Text Available
Nonsparse Learning with Latent Variables

https://doi.org/10.1287/opre.2020.2005

Zheng, Zemin; Lv, Jinchi; Lin, Wei (January 2021, Operations Research)
null (Ed.)
As a popular tool for producing meaningful and interpretable models, large-scale sparse learning works efficiently in many optimization applications when the underlying structures are indeed or close to sparse. However, naively applying the existing regularization methods can result in misleading outcomes because of model misspecification. In this paper, we consider nonsparse learning under the factors plus sparsity structure, which yields a joint modeling of sparse individual effects and common latent factors. A new methodology of nonsparse learning with latent variables (NSL) is proposed for joint estimation of the effects of two groups of features, one for individual effects and the other associated with the latent substructures, when the nonsparse effects are captured by the leading population principal component score vectors. We derive the convergence rates of both sample principal components and their score vectors that hold for a wide class of distributions. With the properly estimated latent variables, properties including model selection consistency and oracle inequalities under various prediction and estimation losses are established. Our new methodology and results are evidenced by simulation and real-data examples.
more » « less
Full Text Available

Search for: All records