We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsityinducing penalties into the expectationmaximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the nonasymptotic convergence rate of the proposed fast grouppenalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
This content will become publicly available on July 31, 2025
 NSFPAR ID:
 10535431
 Publisher / Repository:
 https://jmlr.org/
 Date Published:
 Journal Name:
 Journal of machine learning research
 ISSN:
 15337928
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

ABSTRACT 
Abstract This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and patient characteristics and clinical factors, including targeted biomarker profiles. We construct a test statistic based on expectation maximization (EM) algorithm and derive its asymptotic distribution under the null hypothesis. An important computational advantage of the test is that the involved parameter estimates under the complex alternative hypothesis can be obtained through a small number of EM iterations, rather than optimizing the objective function. We demonstrate the finite sample performance of the proposed test in terms of type‐I error rate and power, using extensive simulation studies. The applicability of the proposed method is illustrated through an application to a multicenter prostate cancer study.

Scalas, Enrico (Ed.)Gaussian processes offer a flexible kernel method for regression. While Gaussian processes have many useful theoretical properties and have proven practically useful, they suffer from poor scaling in the number of observations. In particular, the cubic time complexity of updating standard Gaussian process models can be a limiting factor in applications. We propose an algorithm for sequentially partitioning the input space and fitting a localized Gaussian process to each disjoint region. The algorithm is shown to have superior time and space complexity to existing methods, and its sequential nature allows the model to be updated efficiently. The algorithm constructs a model for which the time complexity of updating is tightly bounded above by a prespecified parameter. To the best of our knowledge, the model is the first local Gaussian process regression model to achieve linear memory complexity. Theoretical continuity properties of the model are proven. We demonstrate the efficacy of the resulting model on several multidimensional regression tasks.more » « less

The ExpectationMaximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the kmeans clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closedform expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, Õ (d/ϵ2) samples suffice to compute the unknown vectors to within ϵ in Mahalanobis distance, where d is the dimension. In particular, the error rate of the EM based estimator is Õ (dn‾‾√) where n is the number of samples, which is optimal up to logarithmic factors.more » « less

MixtureofExperts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of nonlinearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed crossmoment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.more » « less