skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Markov neighborhood regression for statistical inference of high‐dimensional generalized linear models
High‐dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high‐dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high‐dimensional inference problems into a series of low‐dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti‐cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.  more » « less
Award ID(s):
2015498
PAR ID:
10445470
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
41
Issue:
20
ISSN:
0277-6715
Format(s):
Medium: X Size: p. 4057-4078
Size(s):
p. 4057-4078
Sponsoring Org:
National Science Foundation
More Like this
  1. This article presents a novel method for learning time‐varying dynamic Bayesian networks. The proposed method breaks down the dynamic Bayesian network learning problem into a sequence of regression inference problems and tackles each problem using the Markov neighborhood regression technique. Notably, the method demonstrates scalability concerning data dimensionality, accommodates time‐varying network structure, and naturally handles multi‐subject data. The proposed method exhibits consistency and offers superior performance compared to existing methods in terms of estimation accuracy and computational efficiency, as supported by extensive numerical experiments. To showcase its effectiveness, we apply the proposed method to an fMRI study investigating the effective connectivity among various regions of interest (ROIs) during an emotion‐processing task. Our findings reveal the pivotal role of the subcortical‐cerebellum in emotion processing. 
    more » « less
  2. In this article, we investigate the problem of simultaneous change point inference and structure recovery in the context of high dimensional Gaussian graphical models with possible abrupt changes. In particular, motivated by neighborhood selection, we incorporate a threshold variable and an unknown threshold parameter into a joint sparse regression model which combines p l1-regularized node-wise regression problems together. The change point estimator and the corresponding estimated coefficients of precision matrices are obtained together. Based on that, a classifier is introduced to distinguish whether a change point exists. To recover the graphical structure correctly, a data-driven thresholding procedure is proposed. In theory, under some sparsity conditions and regularity assumptions, our method can correctly choose a homogeneous or heterogeneous model with high accuracy. Furthermore, in the latter case with a change point, we establish estimation consistency of the change point estimator, by allowing the number of nodes being much larger than the sample size. Moreover, it is shown that, in terms of structure recovery of Gaussian graphical models, the proposed thresholding procedure achieves model selection consistency and controls the number of false positives. The validity of our proposed method is justified via extensive numerical studies. Finally, we apply our proposed method to the S&P 500 dataset to show its empirical usefulness. 
    more » « less
  3. Abstract We propose a very fast approximate Markov chain Monte Carlo sampling framework that is applicable to a large class of sparse Bayesian inference problems. The computational cost per iteration in several regression models is of order O(n(s+J)), where n is the sample size, s is the underlying sparsity of the model, and J is the size of a randomly selected subset of regressors. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. [(2013). Analyzing Hogwild parallel Gaussian Gibbs sampling. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) (Vol. 2, pp. 2715–2723)], but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening [Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10, 2013–2038]. We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore, we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models. 
    more » « less
  4. Abstract Bayesian data analysis is about more than just computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Practical Bayesian data analysis, like all data analysis, is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers. 
    more » « less
  5. Accurate yield prediction in integrated circuit manufacturing enables accurate estimation of production cost and early detection of processing problems. It is known that defects tend to be clustered and a chip is likely to be defective if its neighbors are defective. This neighborhood effect is not well captured in traditional yield modeling approaches. We propose a new yield prediction model, called adjacency-clustering which addresses, for the first time, the neighborhood effect, and delivers prediction results that are significantly better than state-of-the-art methods. Adjacency-clustering (AC) model is a form of the Markov random field minimum energy model (MRF) that is primarily known in the context of image segmentation. AC model is a novel use of MRF for identifying defect patterns that enable diagnosis of failure causes in the manufacturing process. In this paper we utilize the defect patterns obtained by the AC model for yield prediction. We compare the performance of the AC model to that of leading yield prediction models including the Poisson, the negative binomial, the Poisson regression, and negative binomial regression models, on real data sets and on simulated data sets. The results demonstrate that the adjacency-clustering model captures the neighborhood effect and delivers superior prediction accuracy. Moreover, the concept and methodology of adjacency-clustering are not limited to integrated circuit manufacturing. Rather, it is applicable in any context where a neighborhood effect is present, such as disease risk mapping and energy consumption prediction. The e-companion is available at https://doi.org/10.1287/opre.2018.1741 . 
    more » « less