NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

High‐dimensional robust inference for Cox regression models using desparsified Lasso

https://doi.org/10.1111/sjos.12543

Kong, Shengchun; Yu, Zhuqing; Zhang, Xianyang; Cheng, Guang (July 2021, Scandinavian Journal of Statistics)

Abstract We consider high‐dimensional inference for potentially misspecified Cox proportional hazard models based on low‐dimensional results by Lin and Wei (1989). A desparsified Lasso estimator is proposed based on the log partial likelihood function and shown to converge to a pseudo‐true parameter vector. Interestingly, the sparsity of the true parameter can be inferred from that of the above limiting parameter. Moreover, each component of the above (nonsparse) estimator is shown to be asymptotically normal with a variance that can be consistently estimated even under model misspecifications. In some cases, this asymptotic distribution leads to valid statistical inference procedures, whose empirical performances are illustrated through numerical examples.
more » « less
2dFDR: a new approach to confounder adjustment substantially increases detection power in omics association studies

https://doi.org/10.1186/s13059-021-02418-8

Yi, Sangyoon; Zhang, Xianyang; Yang, Lu; Huang, Jinyan; Liu, Yuanhang; Wang, Chen; Schaid, Daniel_J; Chen, Jun (July 2021, Genome Biology)

Abstract One challenge facing omics association studies is the loss of statistical power when adjusting for confounders and multiple testing. The traditional statistical procedure involves fitting a confounder-adjusted regression model for each omics feature, followed by multiple testing correction. Here we show that the traditional procedure is not optimal and present a new approach, 2dFDR, a two-dimensional false discovery rate control procedure, for powerful confounder adjustment in multiple testing. Through extensive evaluation, we demonstrate that 2dFDR is more powerful than the traditional procedure, and in the presence of strong confounding and weak signals, the power improvement could be more than 100%.
more » « less
D-MANOVA: fast distance-based multivariate analysis of variance for large-scale microbiome association studies

https://doi.org/10.1093/bioinformatics/btab498

Chen, Jun; Zhang, Xianyang (July 2021, Bioinformatics)
Schwartz, Russell (Ed.)
Abstract Summary PERMANOVA (permutational multivariate analysis of variance based on distances) has been widely used for testing the association between the microbiome and a covariate of interest. Statistical significance is established by permutation, which is computationally intensive for large sample sizes. As large-scale microbiome studies, such as American Gut Project (AGP), become increasingly popular, a computationally efficient version of PERMANOVA is much needed. To achieve this end, we derive the asymptotic distribution of the PERMANOVA pseudo-F statistic and provide analytical P-value calculation based on chi-square approximation. We show that the asymptotic P-value is close to the PERMANOVA P-value even under a moderate sample size. Moreover, it is more accurate and an order-of-magnitude faster than the permutation-free method MDMR. We demonstrated the use of our procedure D-MANOVA on the AGP dataset. Availability and implementation D-MANOVA is implemented by the dmanova function in the CRAN package GUniFrac. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Detection of Local Differences in Spatial Characteristics Between Two Spatiotemporal Random Fields

https://doi.org/10.1080/01621459.2020.1775613

Yun, Sooin; Zhang, Xianyang; Li, Bo (January 2021, Journal of the American Statistical Association)
null (Ed.)
Full Text Available
Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing

https://doi.org/10.1186/s13059-020-02001-7

Huang, Jinyan; Bai, Ling; Cui, Bowen; Wu, Liang; Wang, Liwen; An, Zhiyin; Ruan, Shulin; Yu, Yue; Zhang, Xianyang; Chen, Jun (December 2020, Genome Biology)

Full Text Available
Covariate adaptive familywise error rate control for genome-wide association studies

https://doi.org/10.1093/biomet/asaa098

Zhou, Huijuan; Zhang, Xianyang; Chen, Jun (November 2020, Biometrika)
null (Ed.)
Summary The familywise error rate has been widely used in genome-wide association studies. With the increasing availability of functional genomics data, it is possible to increase detection power by leveraging these genomic functional annotations. Previous efforts to accommodate covariates in multiple testing focused on false discovery rate control, while covariate-adaptive procedures controlling the familywise error rate remain underdeveloped. Here, we propose a novel covariate-adaptive procedure to control the familywise error rate that incorporates external covariates which are potentially informative of either the statistical power or the prior null probability. An efficient algorithm is developed to implement the proposed method. We prove its asymptotic validity and obtain the rate of convergence through a perturbation-type argument. Our numerical studies show that the new procedure is more powerful than competing methods and maintains robustness across different settings. We apply the proposed approach to the UK Biobank data and analyse 27 traits with 9 million single-nucleotide polymorphisms tested for associations. Seventy-five genomic annotations are used as covariates. Our approach detects more genome-wide significant loci than other methods in 21 out of the 27 traits.
more » « less
Full Text Available

Search for: All records