skip to main content

Title: Multimodal Data Visualization and Denoising with Integrated Diffusion
We propose a method called integrated diffusion for combining multimodal data, gathered via different sensors on the same system, to create a integrated data diffusion operator. As real world data suffers from both local and global noise, we introduce mechanisms to optimally calculate a diffusion operator that reflects the combined information in data by maintaining low frequency eigenvectors of each modality both globally and locally. We show the utility of this integrated operator in denoising and visualizing multimodal toy data as well as multi-omic data generated from blood cells, measuring both gene expression and chromatin accessibility. Our approach better visualizes the geometry of the integrated data and captures known cross-modality associations. More generally, integrated diffusion is broadly applicable to multimodal datasets generated by noisy sensors collected in a variety of fields.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Machine Learning for Signal Processing
Page Range / eLocation ID:
1 to 6
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Boiling is a high-performance heat dissipation process that is central to electronics cooling and power generation. The past decades have witnessed significantly improved and better-controlled boiling heat transfer using structured surfaces, whereas the physical mechanisms that dominate structure-enhanced boiling remain contested. Experimental characterization of boiling has been challenging due to the high dimensionality, stochasticity, and dynamicity of the boiling process. To tackle these issues, this paper presents a coupled multimodal sensing and data fusion platform to characterize boiling states and heat fluxes and identify the key transport parameters in different boiling stages. Pool boiling tests of water on multi-tier copper structures are performed under both steady-state and transient heat loads, during which multimodal, multidimensional signals are recorded, including temperature profiles, optical imaging, and acoustic signals via contact acoustic emission (AE) sensors, hydrophones immersed in the liquid pool, and condenser microphones outside the boiling chamber. The physics-based analysis is focused on i) extracting dynamic characteristics of boiling from time lags between acoustic-optical-thermal signals, ii) analyzing energy balance between thermal diffusion, bubble growth, and acoustic dissipation, and iii) decoupling the response signals for different physical processes, e.g., low-to-midfrequency range AE induced by thermal expansion of liquids and bubble ebullition. Separate multimodal sensing tests, namely a single-phase liquid test and a single-bubble-dynamics test, are performed to reinforce the analysis, which confirms an AE peak of 1.5 kHz corresponding to bubble ebullition. The data-driven analysis is focused on enabling the early fusion of acoustic and optical signals for improved boiling state and flux predictions. Unlike single-modality analysis or commonly-used late fusion algorithms that concatenate processed signals in dense layers, the current work performs the fusion process in the deep feature domain using a multi-layer perceptron regression model. This early fusion algorithm is shown to lead to more accurate and robust predictions. The coupled multimodal sensing and data fusion platform is promising to enable reliable thermal monitoring and advance the understanding of dominant transport mechanisms during boiling.

    more » « less
  2. Brain signals can be measured using multiple imaging modalities, such as magnetic resonance imaging (MRI)-based techniques. Different modalities convey distinct yet complementary information; thus, their joint analyses can provide valuable insight into how the brain functions in both healthy and diseased conditions. Data-driven approaches have proven most useful for multimodal fusion as they minimize assumptions imposed on the data, and there are a number of methods that have been developed to uncover relationships across modalities. However, none of these methods, to the best of our knowledge, can discover “one-to-many associations”, meaning one component from one modality is linked with more than one component from another modality. However, such “one-to-many associations” are likely to exist, since the same brain region can be involved in multiple neurological processes. Additionally, most existing data fusion methods require the signal subspace order to be identical for all modalities—a severe restriction for real-world data of different modalities. Here, we propose a new fusion technique—the consecutive independence and correlation transform (C-ICT) model—which successively performs independent component analysis and independent vector analysis and is uniquely flexible in terms of the number of datasets, signal subspace order, and the opportunity to find “one-to-many associations”. We apply C-ICT to fuse diffusion MRI, structural MRI, and functional MRI datasets collected from healthy controls (HCs) and patients with schizophrenia (SZs). We identify six interpretable triplets of components, each of which consists of three associated components from the three modalities. Besides, components from these triplets that show significant group differences between the HCs and SZs are identified, which could be seen as putative biomarkers in schizophrenia. 
    more » « less
  3. Abstract

    This work proposes a novel generative multimodal approach to jointly analyze multimodal data while linking the multimodal information to colors. We apply our proposed framework, which disentangles multimodal data into private and shared sets of features from pairs of structural (sMRI), functional (sFNC and ICA), and diffusion MRI data (FA maps). With our approach, we find that heterogeneity in schizophrenia is potentially a function of modality pairs. Results show (1) schizophrenia is highly multimodal and includes changes in specific networks, (2) non‐linear relationships with schizophrenia are observed when interpolating among shared latent dimensions, and (3) we observe a decrease in the modularity of functional connectivity and decreased visual‐sensorimotor connectivity for schizophrenia patients for the FA‐sFNC and sMRI‐sFNC modality pairs, respectively. Additionally, our results generally indicate decreased fractional corpus callosum anisotropy, and decreased spatial ICA map and voxel‐based morphometry strength in the superior frontal lobe as found in the FA‐sFNC, sMRI‐FA, and sMRI‐ICA modality pair clusters. In sum, we introduce a powerful new multimodal neuroimaging framework designed to provide a rich and intuitive understanding of the data which we hope challenges the reader to think differently about how modalities interact.

    more » « less
  4. Our study is motivated by robotics, where when dealing with robots or other physical systems, we often need to balance competing concerns of relying on complex, multimodal data coming from a variety of sensors with a general lack of large representative datasets. Despite the complexity of modern robotic platforms and the need for multimodal interaction, there has been little research on integrating more than two modalities in a low data regime with the real-world constraint that sensors fail due to obstructions or adverse conditions. In this work, we consider a case in which natural language is used as a retrieval query against objects, represented across multiple modalities, in a physical environment. We introduce extended multimodal alignment (EMMA), a method that learns to select the appropriate object while jointly refining modality-specific embeddings through a geometric (distance-based) loss. In contrast to prior work, our approach is able to incorporate an arbitrary number of views (modalities) of a particular piece of data. We demonstrate the efficacy of our model on a grounded language object retrieval scenario. We show that our model outperforms state-of-the-art baselines when little training data is available. Our code is available at 
    more » « less
  5. In this paper, we present ViTag to associate user identities across multimodal data, particularly those obtained from cameras and smartphones. ViTag associates a sequence of vision tracker generated bounding boxes with Inertial Measurement Unit (IMU) data and Wi-Fi Fine Time Measurements (FTM) from smartphones. We formulate the problem as association by sequence to sequence (seq2seq) translation. In this two-step process, our system first performs cross-modal translation using a multimodal LSTM encoder-decoder network (X-Translator) that translates one modality to another, e.g. reconstructing IMU and FTM readings purely from camera bounding boxes. Second, an association module finds identity matches between camera and phone domains, where the translated modality is then matched with the observed data from the same modality. In contrast to existing works, our proposed approach can associate identities in multi-person scenarios where all users may be performing the same activity. Extensive experiments in real-world indoor and outdoor environments demonstrate that online association on camera and phone data (IMU and FTM) achieves an average Identity Precision Accuracy (IDP) of 88.39% on a 1 to 3 seconds window, outperforming the state-of-the-art Vi-Fi (82.93%). Further study on modalities within the phone domain shows the FTM can improve association performance by 12.56% on average. Finally, results from our sensitivity experiments demonstrate the robustness of ViTag under different noise and environment variations. 
    more » « less