Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data.
more »
« less
Bayesian network-guided sparse regression with flexible varying effects
ABSTRACT In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.
more »
« less
- PAR ID:
- 10547740
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 80
- Issue:
- 4
- ISSN:
- 0006-341X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This article seeks to investigate the impact of aging on functional connectivity across different cognitive control scenarios, particularly emphasizing the identification of brain regions significantly associated with early aging. By conceptualizing functional connectivity within each cognitive control scenario as a graph, with brain regions as nodes, the statistical challenge revolves around devising a regression framework to predict a binary scalar outcome (aging or normal) using multiple graph predictors. Popular regression methods utilizing multiplex graph predictors often face limitations in effectively harnessing information within and across graph layers, leading to potentially less accurate inference and predictive accuracy, especially for smaller sample sizes. To address this challenge, we propose the Bayesian Multiplex Graph Classifier (BMGC). Accounting for multiplex graph topology, our method models edge coefficients at each graph layer using bilinear interactions between the latent effects associated with the two nodes connected by the edge. This approach also employs a variable selection framework on node-specific latent effects from all graph layers to identify influential nodes linked to observed outcomes. Crucially, the proposed framework is computationally efficient and quantifies the uncertainty in node identification, coefficient estimation, and binary outcome prediction. BMGC outperforms alternative methods in terms of the aforementioned metrics in simulation studies. An additional BMGC validation was completed using an fMRI study of brain networks in adults. The proposed BMGC technique identified that sensory motor brain network obeys certain lateral symmetries, whereas the default mode network exhibits significant brain asymmetries associated with early aging.more » « less
-
Abstract With advances in biomedical research, biomarkers are becoming increasingly important prognostic factors for predicting overall survival, while the measurement of biomarkers is often censored due to instruments' lower limits of detection. This leads to two types of censoring: random censoring in overall survival outcomes and fixed censoring in biomarker covariates, posing new challenges in statistical modeling and inference. Existing methods for analyzing such data focus primarily on linear regression ignoring censored responses or semiparametric accelerated failure time models with covariates under detection limits (DL). In this paper, we propose a quantile regression for survival data with covariates subject to DL. Comparing to existing methods, the proposed approach provides a more versatile tool for modeling the distribution of survival outcomes by allowing covariate effects to vary across conditional quantiles of the survival time and requiring no parametric distribution assumptions for outcome data. To estimate the quantile process of regression coefficients, we develop a novel multiple imputation approach based on another quantile regression for covariates under DL, avoiding stringent parametric restrictions on censored covariates as often assumed in the literature. Under regularity conditions, we show that the estimation procedure yields uniformly consistent and asymptotically normal estimators. Simulation results demonstrate the satisfactory finite‐sample performance of the method. We also apply our method to the motivating data from a study of genetic and inflammatory markers of Sepsis.more » « less
-
We propose time‐varying coefficient model selection and estimation based on the spline approach, which is capable of capturing time‐dependent covariate effects. The new penalty function utilizes local‐region information for varying‐coefficient estimation, in contrast to the traditional model selection approach focusing on the entire region. The proposed method is extremely useful when the signals associated with relevant predictors are time‐dependent, and detecting relevant covariate effects in the local region is more scientifically relevant than those of the entire region. Our simulation studies indicate that the proposed model selection incorporating local features outperforms the global feature model selection approaches. The proposed method is also illustrated through a longitudinal growth and health study from National Heart, Lung, and Blood Institute.more » « less
-
null (Ed.)Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.more » « less
An official website of the United States government
