High‐dimensional data with censored outcomes of interest are prevalent in medical research. To analyze such data, the regularized Buckley–James estimator has been successfully applied to build accurate predictive models and conduct variable selection. In this paper, we consider the problem of parameter estimation and variable selection for the semiparametric accelerated failure time model for high‐dimensional block‐missing multimodal neuroimaging data with censored outcomes. We propose a penalized Buckley–James method that can simultaneously handle block‐wise missing covariates and censored outcomes. This method can also perform variable selection. The proposed method is evaluated by simulations and applied to a multimodal neuroimaging dataset and obtains meaningful results.
more » « less- PAR ID:
- 10387849
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Stat
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2049-1573
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Multi-modal data are prevalent in many scientific fields. In this study, we consider the parameter estimation and variable selection for a multi-response regression using block-missing multi-modal data. Our method allows the dimensions of both the responses and the predictors to be large, and the responses to be incomplete and correlated, a common practical problem in high-dimensional settings. Our proposed method uses two steps to make a prediction from a multi-response linear regression model with block-missing multi-modal predictors. In the first step, without imputing missing data, we use all available data to estimate the covariance matrix of the predictors and the cross-covariance matrix between the predictors and the responses. In the second step, we use these matrices and a penalized method to simultaneously estimate the precision matrix of the response vector, given the predictors, and the sparse regression parameter matrix. Lastly, we demonstrate the effectiveness of the proposed method using theoretical studies, simulated examples, and an analysis of a multi-modal imaging data set from the Alzheimer’s Disease Neuroimaging Initiative.more » « less
-
Abstract With advancements in technology, the collection of multiple types of measurements on a common set of subjects is becoming routine in science. Some notable examples include multimodal neuroimaging studies for the simultaneous investigation of brain structure and function and multi‐omics studies for combining genetic and genomic information. Integrative analysis of multimodal data allows scientists to interrogate new mechanistic questions. However, the data collection and generation of integrative hypotheses is outpacing available methodology for joint analysis of multimodal measurements. In this article, we study high‐dimensional multimodal data integration in the context of mediation analysis. We aim to understand the roles that different data modalities play as possible mediators in the pathway between an exposure variable and an outcome. We propose a mediation model framework with two data types serving as separate sets of mediators and develop a penalized optimization approach for parameter estimation. We study both the theoretical properties of the estimator through an asymptotic analysis and its finite‐sample performance through simulations. We illustrate our method with a multimodal brain pathway analysis having both structural and functional connectivity as mediators in the association between sex and language processing.
-
Abstract Multimodality or multiconstruct data arise increasingly in functional neuroimaging studies to characterize brain activity under different cognitive states. Relying on those high-resolution imaging collections, it is of great interest to identify predictive imaging markers and intermodality interactions with respect to behavior outcomes. Currently, most of the existing variable selection models do not consider predictive effects from interactions, and the desired higher-order terms can only be included in the predictive mechanism following a two-step procedure, suffering from potential misspecification. In this paper, we propose a unified Bayesian prior model to simultaneously identify main effect features and intermodality interactions within the same inference platform in the presence of high-dimensional data. To accommodate the brain topological information and correlation between modalities, our prior is designed by compiling the intermediate selection status of sequential partitions in light of the data structure and brain anatomical architecture, so that we can improve posterior inference and enhance biological plausibility. Through extensive simulations, we show the superiority of our approach in main and interaction effects selection, and prediction under multimodality data. Applying the method to the Adolescent Brain Cognitive Development (ABCD) study, we characterize the brain functional underpinnings with respect to general cognitive ability under different memory load conditions.
-
null (Ed.)Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline.more » « less
-
The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer’s disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.more » « less