skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Decentralized Parallel Independent Component Analysis for Multimodal, Multisite Data
Large amounts of neuroimaging and omics data have been generated for studies of mental health. Collaborations among research groups that share data have shown increased power for new discoveries of brain abnormalities, genetic mutations, and associations among genetics, neuroimaging and behavior. However, sharing raw data can be challenging for various reasons. A federated data analysis allowing for collaboration without exposing the raw dataset of each site becomes ideal. Following this strategy, a decentralized parallel independent component analysis (dpICA) is proposed in this study which is an extension of the state-of-art Parallel ICA (pICA). pICA is an effective method to analyze two data modalities simultaneously by jointly extracting independent components of each modality and maximizing connections between modalities. We evaluated the dpICA algorithm using neuroimage and genetic data from patients with schizophrenia and health controls, and compared its performances under various conditions with the centralized pICA. The results showed dpICA is robust to sample distribution across sites as long as numbers of samples in each site are sufficient. It can produce the same imaging and genetic components and the same connections between those components as the centralized pICA. Thus our study supports dpICA is an accurate and effective decentralized algorithm to extract connections from two data modalities.  more » « less
Award ID(s):
2112455
PAR ID:
10569588
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-2447-1
Page Range / eLocation ID:
1 to 4
Format(s):
Medium: X
Location:
Sydney, Australia
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent studies showed that working with neuroimage data collected from different research facilities or locations may incur additional source dependency, affecting the overall statistical power. This problem can be mitigated with data harmonization approaches. Recently, the ComBat method has become commonly adopted for various neuroimage modalities. While open neuroimaging datasets are becoming more common, a substantial amount of data is still unable to be shared for various reasons. In addition, current approaches require moving all the data to a central location, which requires additional resources and creates redundant copies of the same datasets. To address these issues, we propose a decentralized harmonization approach that does not create redundant copies of the original datasets and performs remote operations on the datasets separately without sharing any individual subject data, ensuring a certain level of privacy and reducing regulatory hurdles. We proposed a novel approach called “Decentralized ComBat” which can harmonize datasets separately without combining the datasets. We tested our model by harmonizing functional network connectivity datasets from two traumatic brain injury studies in a decentralized way. Also, we used simulations to analyze the performance and scalability of our model when the number of data collection sites increases. We compare the output with centralized ComBat and show that the proposed approach produces similar results, increasing the sensitivity of the functional network connectivity analysis and validating our approach. Simulations show that our model can be easily scaled to many more datasets based on the requirement. In sum, we believe this provides a powerful tool, further complementing open data and allowing for integrating public and private datasets. 
    more » « less
  2. null (Ed.)
    Recently decentralized optimization attracts much attention in machine learning because it is more communication-efficient than the centralized fashion. Quantization is a promising method to reduce the communication cost via cutting down the budget of each single communication using the gradient compression. To further improve the communication efficiency, more recently, some quantized decentralized algorithms have been studied. However, the quantized decentralized algorithm for nonconvex constrained machine learning problems is still limited. Frank-Wolfe (a.k.a., conditional gradient or projection-free) method is very efficient to solve many constrained optimization tasks, such as low-rank or sparsity-constrained models training. In this paper, to fill the gap of decentralized quantized constrained optimization, we propose a novel communication-efficient Decentralized Quantized Stochastic Frank-Wolfe (DQSFW) algorithm for non-convex constrained learning models. We first design a new counterexample to show that the vanilla decentralized quantized stochastic Frank-Wolfe algorithm usually diverges. Thus, we propose DQSFW algorithm with the gradient tracking technique to guarantee the method will converge to the stationary point of non-convex optimization safely. In our theoretical analysis, we prove that to achieve the stationary point our DQSFW algorithm achieves the same gradient complexity as the standard stochastic Frank-Wolfe and centralized Frank-Wolfe algorithms, but has much less communication cost. Experiments on matrix completion and model compression applications demonstrate the efficiency of our new algorithm. 
    more » « less
  3. Abstract The United States of America has a diverse collection of freshwater mussels comprising 301 species distributed among 59 genera and two families (Margaritiferidae and Unionidae), each having a unique suite of traits. Mussels are among the most imperilled animals and are critical components of their ecosystems, and successful management, conservation and research requires a cohesive and widely accessible data source. Although trait-based analysis for mussels has increased, only a small proportion of traits reflecting mussel diversity in this region has been collated. Decentralized and non-standardized trait information impedes large-scale analysis. Assembling trait data in a synthetic dataset enables comparison across species and lineages and identification of data gaps. We collated data from the primary literature, books, state and federal reports, theses and dissertations, and museum collections into a centralized dataset covering information on taxonomy, morphology, reproductive ecology and life history, fish hosts, habitats, thermal tolerance, geographic distribution, available genetic information, and conservation status. By collating these traits, we aid researchers in assessing variation in mussel traits and modelling ecosystem change. 
    more » « less
  4. ABSTRACT With the increasing availability of large‐scale multimodal neuroimaging datasets, it is necessary to develop data fusion methods which can extract cross‐modal features. A general framework, multidataset independent subspace analysis (MISA), has been developed to encompass multiple blind source separation approaches and identify linked cross‐modal sources in multiple datasets. In this work, we utilized the multimodal independent vector analysis (MMIVA) model in MISA to directly identify meaningful linked features across three neuroimaging modalities—structural magnetic resonance imaging (MRI), resting state functional MRI and diffusion MRI—in two large independent datasets, one comprising of control subjects and the other including patients with schizophrenia. Results show several linked subject profiles (sources) that capture age‐associated decline, schizophrenia‐related biomarkers, sex effects, and cognitive performance. For sources associated with age, both shared and modality‐specific brain‐age deltas were evaluated for association with non‐imaging variables. In addition, each set of linked sources reveals a corresponding set of cross‐modal spatial patterns that can be studied jointly. We demonstrate that the MMIVA fusion model can identify linked sources across multiple modalities, and that at least one set of linked, age‐related sources replicates across two independent and separately analyzed datasets. The same set also presented age‐adjusted group differences, with schizophrenia patients indicating lower multimodal source levels. Linked sets associated with sex and cognition are also reported for the UK Biobank dataset. 
    more » « less
  5. Alzheimer's disease (AD) is a serious neurodegenerative condition that affects millions of people across the world. Recently machine learning models have been used to predict the progression of AD, although they frequently do not take advantage of the longitudinal and structural components associated with multi-modal medical data. To address this, we present a new algorithm that uses the multi-block alternating direction method of multipliers to optimize a novel objective that combines multi-modal longitudinal clinical data of various modalities to simultaneously predict the cognitive scores and diagnoses of the participants in the Alzheimer's Disease Neuroimaging Initiative cohort. Our new model is designed to leverage the structure associated with clinical data that is not incorporated into standard machine learning optimization algorithms. This new approach shows state-of-the-art predictive performance and validates a collection of brain and genetic biomarkers that have been recorded previously in AD literature. 
    more » « less