skip to main content

Search for: All records

Award ID contains: 1953356

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We propose a deep learning–based knockoffs inference framework, DeepLINK, that guarantees the false discovery rate (FDR) control in high-dimensional settings. DeepLINK is applicable to a broad class of covariate distributions described by the possibly nonlinear latent factor models. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control. The empirical performance of DeepLINK is investigated through extensive simulation studies, where it is shown to achieve FDR control in feature selection with both high selection power and high prediction accuracy. We also apply DeepLINK to three real data applications to demonstrate its practical utility.

    more » « less
  2. Online commerce websites often request users to register in the online shopping process. Recognizing the challenges of user registration, many websites opt to delay their registration request until the end of the conversion funnel (i.e., ex post registration request). Our study explores an alternative approach by asking users to register with the website at the beginning of their shopping journey (i.e., ex ante registration request). Guided by a stylized analytical model, we conducted a large-scale randomized field experiment in partnership with an online retailer in the United States to examine how the ex ante request affects users’ registration decisions, short-term customer conversions, and long-term purchase behaviors. Specifically, we randomly assigned the new users in the website’s incoming traffic to one of two experimental groups: one with an ex ante registration request preceding the ex post request (treatment) and the other with only an ex post registration request (control). Our results show that the ex ante request leads to an increased probability of user registration; that is, the users in the treatment group, on average, are 58.08% relatively more likely to register with the website than those in the control group. Furthermore, the ex ante request leads to significant increases in customer purchases in the long run. Based on our estimation of the local average treatment effects, the ex ante registered users are 10.89% relatively more likely to make a purchase, place a 16.76% relatively greater number of orders, and generate 13.22% relatively higher total revenue for the firm in the long run. Finally, the ex ante request also does not impact customer conversion in the short-term. Further investigation into the long-term and short-term effects provides suggestive evidence on several potential mechanisms, such as firm-initiated interaction and screening of low-interest users. Our study provides managerial implications to the e-commerce websites on customer acquisition and contributes to the research on IT artifact design. 
    more » « less
  3. Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv & Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors. 
    more » « less
  4. null (Ed.)
    As a popular tool for producing meaningful and interpretable models, large-scale sparse learning works efficiently in many optimization applications when the underlying structures are indeed or close to sparse. However, naively applying the existing regularization methods can result in misleading outcomes because of model misspecification. In this paper, we consider nonsparse learning under the factors plus sparsity structure, which yields a joint modeling of sparse individual effects and common latent factors. A new methodology of nonsparse learning with latent variables (NSL) is proposed for joint estimation of the effects of two groups of features, one for individual effects and the other associated with the latent substructures, when the nonsparse effects are captured by the leading population principal component score vectors. We derive the convergence rates of both sample principal components and their score vectors that hold for a wide class of distributions. With the properly estimated latent variables, properties including model selection consistency and oracle inequalities under various prediction and estimation losses are established. Our new methodology and results are evidenced by simulation and real-data examples. 
    more » « less