skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling
Abstract A central but challenging problem in genetic studies is to test for (usually weak) associations between a complex trait (e.g., a disease status) and sets of multiple genetic variants. Due to the lack of a uniformly most powerful test, data‐adaptive tests, such as the adaptive sum of powered score (aSPU) test, are advantageous in maintaining high power against a wide range of alternatives. However, there is often no closed‐form to accurately and analytically calculate thep‐values of many adaptive tests like aSPU, thus Monte Carlo (MC) simulations are often used, which can be time consuming to achieve a stringent significance level (e.g., 5e‐8) used in genome‐wide association studies (GWAS). To estimate such a smallp‐value, we need a huge number of MC simulations (e.g., 1e+10). As an alternative, we propose using importance sampling to speed up such calculations. We develop some theory to motivate a proposed algorithm for the aSPU test, and show that the proposed method is computationally more efficient than the standard MC simulations. Using both simulated and real data, we demonstrate the superior performance of the new method over the standard MC simulations.  more » « less
Award ID(s):
1846747 1712717 1659328
PAR ID:
10364635
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 261-273
Size(s):
p. 261-273
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Optical absorption and scattering properties are often estimated from the diffusive reflection light intensity at only one distance from the material surface, which often encounters accuracy and convergence issues. In this work, a method was proposed to determine optical properties by using diffusive reflection light intensity profiles at multiple distances, which enhanced data richness as a result of the intensity profiles are linearly independent. In this method, five features of light intensity profiles (contrast, correlation, energy, homogeneity, and second moment) were used to reduce the data dimensions. To demonstrate the effectiveness of the proposed method, Monte Carlo (MC) simulations were used to generate diffusive reflection light intensity profiles with noise at different distances for various combinations of four optical properties (absorption coefficientμa, scattering coefficientμs, isotropic coefficientg, and refractive indexn). The five profile feature vectors were used as inputs and the four optical parameters were used as outputs to train and test a backpropagation (BP) neural network. The influences of noise levels and the number of diffusive light intensity profiles on parameter estimation accuracy were investigated. The four optical parameters estimated by the BP network were compared with the results estimated by the traditional least squares method, which shows that the proposed method can estimate the optical properties with higher accuracy and better convergence. 
    more » « less
  2. The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a S calable E xact A l G orithm for L arge-scale set-based G× E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p -value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10 5 , is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index. 
    more » « less
  3. Summary Nonparametric covariate adjustment is considered for log-rank-type tests of the treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability in the sense that the same formula of test can be universally applied to simple randomization and all commonly used covariate-adaptive randomization schemes such as the stratified permuted block and the Pocock–Simon minimization, which is not a property enjoyed by the unadjusted log-rank test. Our method is supported by novel asymptotic theory and empirical results for Type-I error and power of tests. 
    more » « less
  4. Abstract Mendelian randomization (MR) has been a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two‐sample summary‐data MR being the most popular. Unfortunately, instruments in MR studies are often weakly associated with the exposure, which can bias effect estimates and inflate Type I errors. In this work, we propose test statistics that are robust under weak‐instrument asymptotics by extending the Anderson–Rubin, Kleibergen, and the conditional likelihood ratio test in econometrics to two‐sample summary‐data MR. We also use the proposed Anderson–Rubin test to develop a point estimator and to detect invalid instruments. We conclude with a simulation and an empirical study and show that the proposed tests control size and have better power than existing methods with weak instruments. 
    more » « less
  5. Abstract We propose new tests for assessing whether covariates in a treatment group and matched control group are balanced in observational studies. The tests exhibit high power under a wide range of multivariate alternatives, some of which existing tests have little power for. The asymptotic permutation null distributions of the proposed tests are studied and theP‐values calculated through the asymptotic results work well in simulation studies, facilitating the application of the test to large data sets. The tests are illustrated in a study of the effect of smoking on blood lead levels. The proposed tests are implemented in anRpackageBalanceCheck. 
    more » « less