skip to main content


Title: The role of diversity in data‐driven analysis of multi‐subject fMRI data: Comparison of approaches based on independence and sparsity using global performance metrics
Abstract

Data‐driven methods have been widely used in functional magnetic resonance imaging (fMRI) data analysis. They extract latent factors, generally, through the use of a simple generative model. Independent component analysis (ICA) and dictionary learning (DL) are two popular data‐driven methods that are based on two different forms of diversity—statistical properties of the data—statistical independence for ICA and sparsity for DL. Despite their popularity, the comparative advantage of emphasizing one property over another in the decomposition of fMRI data is not well understood. Such a comparison is made harder due to the differences in the modeling assumptions between ICA and DL, as well as within different ICA algorithms where each algorithm exploits a different form of diversity. In this paper, we propose the use of objective global measures, such as time course frequency power ratio, network connection summary, and graph theoretical metrics, to gain insight into the role that different types of diversity have on the analysis of fMRI data. Four ICA algorithms that account for different types of diversity and one DL algorithm are studied. We apply these algorithms to real fMRI data collected from patients with schizophrenia and healthy controls. Our results suggest that no one particular method has the best performance using all metrics, implying that the optimal method will change depending on the goal of the analysis. However, we note that in none of the scenarios we test the highly popular Infomax provides the best performance, demonstrating the cost of exploiting limited form of diversity.

 
more » « less
Award ID(s):
1631838 1618551 1921917
NSF-PAR ID:
10075808
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Human Brain Mapping
Volume:
40
Issue:
2
ISSN:
1065-9471
Page Range / eLocation ID:
p. 489-504
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Independent component analysis (ICA) has found wide application in a variety of areas, and analysis of functional magnetic resonance imaging (fMRI) data has been a particularly fruitful one. Maximum likelihood provides a natural formulation for ICA and allows one to take into account multiple statistical properties of the data—forms of diversity. While use of multiple types of diversity allows for additional flexibility, it comes at a cost, leading to high variability in the solution space. In this paper, using simulated as well as fMRI-like data, we provide insight into the trade-offs between estimation accuracy and algorithmic consistency with or without deviations from the assumed model and assumptions such as the statistical independence. Additionally, we propose a new metric, cross inter-symbol interference, to quantify the consistency of an algorithm across different runs, and demonstrate its desirable performance for selecting consistent run compared to other metrics used for the task. 
    more » « less
  2. null (Ed.)
    Monitoring of fetal electrocardiogram (fECG) would provide useful information about fetal wellbeing as well as any abnormal development during pregnancy. Recent advances in flexible electronics and wearable technologies have enabled compact devices to acquire personal physiological signals in the home setting, including those of expectant mothers. However, the high noise level in the daily life renders long-entrenched challenges to extract fECG from the combined fetal/maternal ECG signal recorded in the abdominal area of the mother. Thus, an efficient fECG extraction scheme is a dire need. In this work, we intensively explored various extraction algorithms, including template subtraction (TS), independent component analysis (ICA), and extended Kalman filter (EKF) using the data from the PhysioNet 2013 Challenge. Furthermore, the modified data with Gaussian and motion noise added, mimicking a practical scenario, were utilized to examine the performance of algorithms. Finally, we combined different algorithms together, yielding promising results, with the best performance in the F1 score of 92.61% achieved by an algorithm combining ICA and TS. With the data modified by adding different types of noise, the combination of ICA–TS–ICA showed the highest F1 score of 85.4%. It should be noted that these combined approaches required higher computational complexity, including execution time and allocated memory compared with other methods. Owing to comprehensive examination through various evaluation metrics in different extraction algorithms, this study provides insights into the implementation and operation of state-of-the-art fetal and maternal monitoring systems in the era of mobile health. 
    more » « less
  3. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  4. Background

    Cognitive training may partially reverse cognitive deficits in people with HIV (PWH). Previous functional MRI (fMRI) studies demonstrate that working memory training (WMT) alters brain activity during working memory tasks, but its effects on resting brain network organization remain unknown.

    Purpose

    To test whether WMT affects PWH brain functional connectivity in resting‐state fMRI (rsfMRI).

    Study Type

    Prospective.

    Population

    A total of 53 PWH (ages 50.7 ± 1.5 years, two women) and 53HIV‐seronegative controls (SN, ages 49.5 ± 1.6 years, six women).

    Field Strength/Sequence

    Axial single‐shot gradient‐echo echo‐planar imaging at 3.0 T was performed at baseline (TL1), at 1‐month (TL2), and at 6‐months (TL3), after WMT.

    Assessment

    All participants had rsfMRI and clinical assessments (including neuropsychological tests) at TL1 before randomization to Cogmed WMT (adaptive training,n = 58: 28 PWH, 30 SN; nonadaptive training,n = 48: 25 PWH, 23 SN), 25 sessions over 5–8 weeks. All assessments were repeated at TL2 and at TL3. The functional connectivity estimated by independent component analysis (ICA) or graph theory (GT) metrics (eigenvector centrality, etc.) for different link densities (LDs) were compared between PWH and SN groups at TL1 and TL2.

    Statistical Tests

    Two‐way analyses of variance (ANOVA) on GT metrics and two‐samplet‐tests on FC or GT metrics were performed. Cognitive (eg memory) measures were correlated with eigenvector centrality (eCent) using Pearson's correlations. The significance level was set atP < 0.05 after false discovery rate correction.

    Results

    The ventral default mode network (vDMN) eCent differed between PWH and SN groups at TL1 but not at TL2 (P = 0.28). In PWH, vDMN eCent changes significantly correlated with changes in the memory ability in PWH (r = −0.62 at LD = 50%) and vDMN eCent before training significantly correlated with memory performance changes (r = 0.53 at LD = 50%).

    Data Conclusion

    ICA and GT analyses showed that adaptive WMT normalized graph properties of the vDMN in PWH.

    Evidence Level

    1

    Technical Efficacy

    1

     
    more » « less
  5. Abstract Background

    In order to accurately accumulate delivered dose for head and neck cancer patients treated with the Adapt to Position workflow on the 1.5T magnetic resonance imaging (MRI)‐linear accelerator (MR‐linac), the low‐resolution T2‐weighted MRIs used for daily setup must be segmented to enable reconstruction of the delivered dose at each fraction.

    Purpose

    In this pilot study, we evaluate various autosegmentation methods for head and neck organs at risk (OARs) on on‐board setup MRIs from the MR‐linac for off‐line reconstruction of delivered dose.

    Methods

    Seven OARs (parotid glands, submandibular glands, mandible, spinal cord, and brainstem) were contoured on 43 images by seven observers each. Ground truth contours were generated using a simultaneous truth and performance level estimation (STAPLE) algorithm. Twenty total autosegmentation methods were evaluated in ADMIRE: 1–9) atlas‐based autosegmentation using a population atlas library (PAL) of 5/10/15 patients with STAPLE, patch fusion (PF), random forest (RF) for label fusion; 10–19) autosegmentation using images from a patient's 1–4 prior fractions (individualized patient prior [IPP]) using STAPLE/PF/RF; 20) deep learning (DL) (3D ResUNet trained on 43 ground truth structure sets plus 45 contoured by one observer). Execution time was measured for each method. Autosegmented structures were compared to ground truth structures using the Dice similarity coefficient, mean surface distance (MSD), Hausdorff distance (HD), and Jaccard index (JI). For each metric and OAR, performance was compared to the inter‐observer variability using Dunn's test with control. Methods were compared pairwise using the Steel‐Dwass test for each metric pooled across all OARs. Further dosimetric analysis was performed on three high‐performing autosegmentation methods (DL, IPP with RF and 4 fractions [IPP_RF_4], IPP with 1 fraction [IPP_1]), and one low‐performing (PAL with STAPLE and 5 atlases [PAL_ST_5]). For five patients, delivered doses from clinical plans were recalculated on setup images with ground truth and autosegmented structure sets. Differences in maximum and mean dose to each structure between the ground truth and autosegmented structures were calculated and correlated with geometric metrics.

    Results

    DL and IPP methods performed best overall, all significantly outperforming inter‐observer variability and with no significant difference between methods in pairwise comparison. PAL methods performed worst overall; most were not significantly different from the inter‐observer variability or from each other. DL was the fastest method (33 s per case) and PAL methods the slowest (3.7–13.8 min per case). Execution time increased with a number of prior fractions/atlases for IPP and PAL. For DL, IPP_1, and IPP_RF_4, the majority (95%) of dose differences were within ± 250 cGy from ground truth, but outlier differences up to 785 cGy occurred. Dose differences were much higher for PAL_ST_5, with outlier differences up to 1920 cGy. Dose differences showed weak but significant correlations with all geometric metrics (R2 between 0.030 and 0.314).

    Conclusions

    The autosegmentation methods offering the best combination of performance and execution time are DL and IPP_1. Dose reconstruction on on‐board T2‐weighted MRIs is feasible with autosegmented structures with minimal dosimetric variation from ground truth, but contours should be visually inspected prior to dose reconstruction in an end‐to‐end dose accumulation workflow.

     
    more » « less