The aim of this paper is to systematically investigate merging and ensembling methods for spatially varying coefficient mixed effects models (SVCMEM) in order to carry out integrative learning of neuroimaging data obtained from multiple biomedical studies. The ”merged” approach involves training a single learning model using a comprehensive dataset that encompasses information from all the studies. Conversely, the ”ensemble” approach involves creating a weighted average of distinct learning models, each developed from an individual study. We systematically investigate the prediction accuracy of the merged and ensemble learners under the presence of different degrees of interstudy heterogeneity. Additionally, we establish asymptotic guidelines for making strategic decisions about when to employ either of these models in different scenarios, along with deriving optimal weights for the ensemble learner. To validate our theoretical results, we perform extensive simulation studies. The proposed methodology is also applied to 3 large-scale neuroimaging studies.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
ABSTRACT -
Free, publicly-accessible full text available January 2, 2025
-
Free, publicly-accessible full text available January 2, 2025
-
Abstract With the advent of high‐throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso‐type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an
penalty to estimate an ultra‐large precision matrix (L 0scalnetL0 ). We applyscalnetL0 to RNA‐seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large‐scale co‐expression network in breast cancer. Simulation studies demonstrate thatscalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large‐scale precision matrix estimation. -
Abstract Many biomedical studies have identified important imaging biomarkers that are associated with both repeated clinical measures and a survival outcome. The functional joint model (FJM) framework, proposed by Li and Luo in 2017, investigates the association between repeated clinical measures and survival data, while adjusting for both high‐dimensional images and low‐dimensional covariates based on the functional principal component analysis (FPCA). In this paper, we propose a novel algorithm for the estimation of FJM based on the functional partial least squares (FPLS). Our numerical studies demonstrate that, compared to FPCA, the proposed FPLS algorithm can yield more accurate and robust estimation and prediction performance in many important scenarios. We apply the proposed FPLS algorithm to a neuroimaging study. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
-
Summary We consider a functional linear Cox regression model for characterizing the association between time-to-event data and a set of functional and scalar predictors. The functional linear Cox regression model incorporates a functional principal component analysis for modeling the functional predictors and a high-dimensional Cox regression model to characterize the joint effects of both functional and scalar predictors on the time-to-event data. We develop an algorithm to calculate the maximum approximate partial likelihood estimates of unknown finite and infinite dimensional parameters. We also systematically investigate the rate of convergence of the maximum approximate partial likelihood estimates and a score test statistic for testing the nullity of the slope function associated with the functional predictors. We demonstrate our estimation and testing procedures by using simulations and the analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data. Our real data analyses show that high-dimensional hippocampus surface data may be an important marker for predicting time to conversion to Alzheimer's disease. Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu).