skip to main content


Title: Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions
Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments.  more » « less
Award ID(s):
1750439
NSF-PAR ID:
10404844
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Findings of the Association for Computational Linguistics: EMNLP 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Total ice water content (IWC) derived from an isokinetic evaporator probe and ice crystal particle size distributions (PSDs) measured by a two-dimensional stereo probe and precipitation imaging probe installed on an aircraft during the 2014 European High Altitude Ice Crystals–North American High IWC field campaign (HAIC/HIWC) were used to characterize regions of high IWC consisting mainly of small ice crystals (HIWC_S) with IWC ≥ 1.0 g m−3and median mass diameter (MMD) < 0.5 mm. A novel fitting routine developed to automatically determine whether a unimodal, bimodal, or trimodal gamma distribution best fits a PSD was used to compare characteristics of HIWC_S and other PSDs (e.g., multimodality, gamma fit parameters) for HIWC_S simulations. The variation of these characteristics and bulk properties (MMD, IWC) was regressed with temperature, IWC, and vertical velocity. HIWC_S regions were most pronounced in updraft cores. The three modes of the PSD reveal different dominant processes contributing to ice growth: nucleation for maximum dimensionD< 0.15 mm, diffusion for 0.15 <D< 1.0 mm, and aggregation forD> 1.0 mm. The frequency of trimodal distributions increased with temperature. The volumes of equally plausible parameters derived in the phase space of gamma fit parameters increased with temperature for unimodal distributions and, for temperatures less than −27°C, for multimodal distributions. Bimodal distributions with 0.4 mm in the larger mode were most common in updraft cores and HIWC_S regions; bimodal distributions with 0.4 mm in the smaller mode were least common in convective cores.

     
    more » « less
  2. null (Ed.)
    Abstract This paper reports a study on the effects of particle size distribution (tuned by mixing different-sized powders) on density of a densely packed powder, powder bed density, and sintered density in binder jetting additive manufacturing. An analytical model was used first to study the mixture packing density. Analytical results showed that multimodal (bimodal or trimodal) mixtures could achieve a higher packing density than their component powders and there existed an optimal mixing fraction to achieve the maximum mixture packing density. Both a lower component particle size ratio (fine to coarse) and a larger component packing density ratio (fine to coarse) led to a larger maximum mixture packing density. A threshold existed for the component packing density ratio, below which the mixing method was not effective for density improvement. Its relationship to the component particle size ratio was calculated and plotted. In addition, the dependence of the optimal mixing fraction and maximum mixture packing density on the component particle size ratio and component packing density ratio was calculated and plotted. These plots can be used as theoretical tools to select parameters for the mixing method. Experimental results of tap density were consistent with the above-mentioned analytical predictions. Also, experimental measurements showed that powders with multimodal particle size distributions achieved a higher tap density, powder bed density, and sintered density in most cases. 
    more » « less
  3. Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants’ affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions. 
    more » « less
  4. Abstract Motor proteins, also known as biological molecular motors, play important roles in various intracellular processes. Experimental investigations suggest that molecular motors interact with each other during the cellular transport, but the nature of such interactions remains not well understood. Stimulated by these observations, we present a theoretical study aimed to understand the effect of the range of interactions on dynamics of interacting molecular motors. For this purpose, we develop a new version of the totally asymmetric simple exclusion processes in which nearest-neighbor as well as the next nearest-neighbor interactions are taken into account in a thermodynamically consistent way. A theoretical framework based on a cluster mean-field approximation, which partially takes correlations into account, is developed to evaluate the stationary properties of the system. It is found that fundamental current–density relations in the system strongly depend on the strength and the sign of interactions, as well as on the range of interactions. For repulsive interactions stronger than some critical value, a mean-field theoretical approach predicts that increasing the range of interactions might lead to a change from unimodal to trimodal dependence in the flux-density fundamental diagram. However, it is not fully supported by extensive Monte Carlo computer simulations that test theoretical predictions. Although in most ranges of parameters a reasonable agreement between theoretical calculations and computer simulations is observed, there are situations when the cluster mean-field approach fails to describe properly the dynamics in the system. Theoretical arguments to explain these observations are presented. Our theoretical analysis clarifies the microscopic picture of how the range of interactions influences the dynamics of interacting molecular motors. 
    more » « less
  5. Abstract

    Although multisensory integration is crucial for sensorimotor function, it is unclear how visual and proprioceptive sensory cues are combined in the brain during motor behaviors. Here we characterized the effects of multisensory interactions on local field potential (LFP) activity obtained from the superior parietal lobule (SPL) as non-human primates performed a reaching task with either unimodal (proprioceptive) or bimodal (visual-proprioceptive) sensory feedback. Based on previous analyses of spiking activity, we hypothesized that evoked LFP responses would be tuned to arm location but would be suppressed on bimodal trials, relative to unimodal trials. We also expected to see a substantial number of recording sites with enhanced beta band spectral power for only one set of feedback conditions (e.g. unimodal or bimodal), as was previously observed for spiking activity. We found that evoked activity and beta band power were tuned to arm location at many individual sites, though this tuning often differed between unimodal and bimodal trials. Across the population, both evoked and beta activity were consistent with feedback-dependent tuning to arm location, while beta band activity also showed evidence of response suppression on bimodal trials. The results suggest that multisensory interactions can alter the tuning and gain of arm position-related LFP activity in the SPL.

     
    more » « less