skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Penalized likelihood and multiple testing
Abstract The classical multiple testing model remains an important practical area of statistics with new approaches still being developed. In this paper we develop a new multiple testing procedure inspired by a method sometimes used in a problem with a different focus. Namely, the inference after model selection problem. We note that solutions to that problem are often accomplished by making use of a penalized likelihood function. A classic example is the Bayesian information criterion (BIC) method. In this paper we construct a generalized BIC method and evaluate its properties as a multiple testing procedure. The procedure is applicable to a wide variety of statistical models including regression, contrasts, treatment versus control, change point, and others. Numerical work indicates that, in particular, for sparse models the new generalized BIC would be preferred over existing multiple testing procedures.  more » « less
Award ID(s):
1712839
PAR ID:
10461982
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biometrical Journal
Volume:
61
Issue:
1
ISSN:
0323-3847
Page Range / eLocation ID:
p. 62-72
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary We discover a connection between the Benjamini–Hochberg procedure and the e-Benjamini–Hochberg procedure (Wang & Ramdas, 2022) with a suitably defined set of e-values. This insight extends to Storey’s procedure and generalized versions of the Benjamini–Hochberg procedure and the model-free multiple testing procedure of Barber & Candés (2015) with a general form of rejection rules. We further summarize these findings in a unified form. These connections open up new possibilities for designing multiple testing procedures in various contexts by aggregating e-values from different procedures or assembling e-values from different data subsets. 
    more » « less
  2. Abstract This study introduces a novelp-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of pairedp-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149–1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings. 
    more » « less
  3. Abstract Factor analysis is a widely used statistical tool in many scientific disciplines, such as psychology, economics, and sociology. As observations linked by networks become increasingly common, incorporating network structures into factor analysis remains an open problem. In this paper, we focus on high-dimensional factor analysis involving network-connected observations, and propose a generalized factor model with latent factors that account for both the network structure and the dependence structure among high-dimensional variables. These latent factors can be shared by the high-dimensional variables and the network, or exclusively applied to either of them. We develop a computationally efficient estimation procedure and establish asymptotic inferential theories. Notably, we show that by borrowing information from the network, the proposed estimator of the factor loading matrix achieves optimal asymptotic variance under much milder identifiability constraints than existing literature. Furthermore, we develop a hypothesis testing procedure to tackle the challenge of discerning the shared and individual latent factors’ structure. The finite sample performance of the proposed method is demonstrated through simulation studies and a real-world dataset involving a statistician co-authorship network. 
    more » « less
  4. Abstract One challenge facing omics association studies is the loss of statistical power when adjusting for confounders and multiple testing. The traditional statistical procedure involves fitting a confounder-adjusted regression model for each omics feature, followed by multiple testing correction. Here we show that the traditional procedure is not optimal and present a new approach, 2dFDR, a two-dimensional false discovery rate control procedure, for powerful confounder adjustment in multiple testing. Through extensive evaluation, we demonstrate that 2dFDR is more powerful than the traditional procedure, and in the presence of strong confounding and weak signals, the power improvement could be more than 100%. 
    more » « less
  5. This paper presents a tool for creating student models in logistic regression. Creating student models has typically been done by expert selection of the appropriate terms, beginning with models as simple as IRT or AFM but more recently with highly complex models like BestLR. While alternative methods exist to select the appropriate predictors for the regression-based models (e.g., step-wise selection or LASSO), we are unaware of their application to student modeling. Such automatic methods of model creation offer the possibility of better student models with either reduced complexity or better fit, in addition to relieving experts from the burden of searching for better models by hand with possible human error. Our tool builds on top of the preexisting R package LKT. We explain our search methods with two datasets demonstrating the advantages of using the tool with stepwise regression and regularization (LASSO) methods to aid in feature selection. For the stepwise method using BIC, the models are simpler (due to the BIC penalty for parameters) than alternatives like BestLR with little lack of fit. For the LASSO method, the models can be made simpler due to the fitting procedure involving a regularization parameter that penalizes large absolute coefficient values. However, LASSO also offers the possibility of highly complex models with exceptional fit. 
    more » « less