Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv & Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors. 
                        more » 
                        « less   
                    This content will become publicly available on February 24, 2026
                            
                            Bayesian Variable Selection in Dynamic Item Response Theory Models
                        
                    
    
            The recent surge in computerized testing brings challenges in the analysis of testing data with classic item response theory (IRT) models. To handle individually varying and irregularly spaced longitudinal dichotomous responses, we adopt a dynamic IRT model framework and then extend the model to link with individual characteristics at a hierarchical level. Further, we have developed an algorithm to select important characteristics of individuals that can capture the growth changes of one’s ability under this multi-level dynamic IRT model, where we can compute the Bayes factor of the proposed model including different covariates using a single Markov chain Monte Carlo output from the full model. In addition, we have shown the model selection consistency under the modified Zellner–Siow prior, and we have conducted simulations to illustrate the properties of the model selection consistency in finite samples. Finally, we have applied our proposed model and computational algorithms to a real data application, called EdSphere dataset, in educational testing. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1848451
- PAR ID:
- 10573666
- Publisher / Repository:
- DOI PREFIX: 10.3102
- Date Published:
- Journal Name:
- Journal of Educational and Behavioral Statistics
- ISSN:
- 1076-9986
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Summary Precision medicine is an emerging scientific topic for disease treatment and prevention that takes into account individual patient characteristics. It is an important direction for clinical research, and many statistical methods have been proposed recently. One of the primary goals of precision medicine is to obtain an optimal individual treatment rule (ITR), which can help make decisions on treatment selection according to each patient's specific characteristics. Recently, outcome weighted learning (OWL) has been proposed to estimate such an optimal ITR in a binary treatment setting by maximizing the expected clinical outcome. However, for ordinal treatment settings, such as individualized dose finding, it is unclear how to use OWL. In this article, we propose a new technique for estimating ITR with ordinal treatments. In particular, we propose a data duplication technique with a piecewise convex loss function. We establish Fisher consistency for the resulting estimated ITR under certain conditions, and obtain the convergence and risk bound properties. Simulated examples and an application to a dataset from a type 2 diabetes mellitus observational study demonstrate the highly competitive performance of the proposed method compared to existing alternatives.more » « less
- 
            In this article, we investigate the problem of simultaneous change point inference and structure recovery in the context of high dimensional Gaussian graphical models with possible abrupt changes. In particular, motivated by neighborhood selection, we incorporate a threshold variable and an unknown threshold parameter into a joint sparse regression model which combines p l1-regularized node-wise regression problems together. The change point estimator and the corresponding estimated coefficients of precision matrices are obtained together. Based on that, a classifier is introduced to distinguish whether a change point exists. To recover the graphical structure correctly, a data-driven thresholding procedure is proposed. In theory, under some sparsity conditions and regularity assumptions, our method can correctly choose a homogeneous or heterogeneous model with high accuracy. Furthermore, in the latter case with a change point, we establish estimation consistency of the change point estimator, by allowing the number of nodes being much larger than the sample size. Moreover, it is shown that, in terms of structure recovery of Gaussian graphical models, the proposed thresholding procedure achieves model selection consistency and controls the number of false positives. The validity of our proposed method is justified via extensive numerical studies. Finally, we apply our proposed method to the S&P 500 dataset to show its empirical usefulness.more » « less
- 
            Spatial–temporal data arise frequently in biomedical, environmental, political and social science studies. Capturing dynamic changes of time-varying correlation structure is scientifically important in spatio-temporal data analysis. We approximate the time-varying empirical estimator of the spatial correlation matrix by groups of selected basis matrices representing substructures of the correlation matrix. After projecting the correlation structure matrix onto a space spanned by basis matrices, we also incorporate varying-coefficient model selection and estimation for signals associated with relevant basis matrices. The unique feature of the proposed method is that signals at local regions corresponding with time can be identified through the proposed penalized objective function. Theoretically, we show model selection consistency and the oracle property in detecting local signals for the varying-coefficient estimators. The proposed method is illustrated through simulation studies and brain fMRI data.more » « less
- 
            von Davier, Matthias (Ed.)Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
