skip to main content


Title: Multi-task Multimodal Learning for Disaster Situation Assessment
During disaster events, emergency response teams need to draw up the response plan at the earliest possible stage. Social media platforms contain rich information which could help to assess the current situation. In this paper, a novel multi-task multimodal deep learning framework with automatic loss weighting is proposed. Our framework is able to capture the correlation among different concepts and data modalities. The proposed automatic loss weighting method can prevent the tedious manual weight tuning process and improve the model performance. Extensive experiments on a large-scale multimodal disaster dataset from Twitter are conducted to identify post-disaster humanitarian category and infrastructure damage level. The results show that by learning the shared latent space of multiple tasks with loss weighting, our model can outperform all single tasks.  more » « less
Award ID(s):
1937019
NSF-PAR ID:
10275318
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
Page Range / eLocation ID:
209 to 212
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Introduction

    Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator’s demonstration into subtasks and classifying the subtasks using sequence modeling.

    Materials and Methods

    A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform—SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset—one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.

    Results

    The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).

    Conclusions

    Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

     
    more » « less
  2. In recent years, semi-supervised learning has been widely explored and shows excellent data efficiency for 2D data. There is an emerging need to improve data efficiency for 3D tasks due to the scarcity of labeled 3D data. This paper explores how the coherence of different modalities of 3D data (e.g. point cloud, image, and mesh) can be used to improve data efficiency for both 3D classification and retrieval tasks. We propose a novel multimodal semi-supervised learning framework by introducing instance-level consistency constraint and a novel multimodal contrastive prototype (M2CP) loss. The instance-level consistency enforces the network to generate consistent representations for multimodal data of the same object regardless of its modality. The M2CP maintains a multimodal prototype for each class and learns features with small intra-class variations by minimizing the feature distance of each object to its prototype while maximizing the distance to the others. Our proposed framework significantly outperforms all the state-of-the-art counterparts for both classification and retrieval tasks by a large margin on the modelNet10 and ModelNet40 datasets. 
    more » « less
  3. Timely, flexible and accurate information dissemination can make a life-and-death difference in managing disasters. Complex command structures and information organization make such dissemination challenging. Thus, it is vital to have an architecture with appropriate naming frameworks, adaptable to the changing roles of participants, focused on content rather than network addresses. To address this, we propose POISE, a name-based and recipient-based publish/subscribe architecture for efficient content dissemination in disaster management. POISE proposes an information layer, improving on state-of-the-art Information-Centric Networking (ICN) solutions such as Named Data Networking (NDN) in two major ways: 1) support for complex graph-based namespaces, and 2) automatic name-based load-splitting. To capture the complexity and dynamicity of disaster response command chains and information flows, POISE proposes a graph-based naming framework, leveraged in a dissemination protocol which exploits information layer rendezvous points (RPs) that perform name expansions. For improved robustness and scalability, POISE allows load-sharing via multiple RPs each managing a subset of the namespace graph. However, excessive workload on one RP may turn it into a “hot spot”, thus impeding performance and reliability. To eliminate such traffic concentration, we propose an automatic load-splitting mechanism, consisting of a namespace graph partitioning complemented by a seamless, loss-less core migration procedure. Due to the nature of our graph partitioning and its complex objectives, off-the-shelf graph partitioning, e.g., METIS, is inadequate. We propose a hybrid partitioning solution, consisting of an initial and a refinement phase. Our simulation results show that POISE outperforms state-of-the-art solutions, demonstrating its effectiveness in timely delivery and load-sharing. 
    more » « less
  4. A major challenge in many machine learning tasks is that the model expressive power depends on model size. Low-rank tensor methods are an efficient tool for handling the curse of dimensionality in many large-scale machine learning models. The major challenges in training a tensor learning model include how to process the high-volume data, how to determine the tensor rank automatically, and how to estimate the uncertainty of the results. While existing tensor learning focuses on a specific task, this paper proposes a generic Bayesian framework that can be employed to solve a broad class of tensor learning problems such as tensor completion, tensor regression, and tensorized neural networks. We develop a low-rank tensor prior for automatic rank determination in nonlinear problems. Our method is implemented with both stochastic gradient Hamiltonian Monte Carlo (SGHMC) and Stein Variational Gradient Descent (SVGD). We compare the automatic rank determination and uncertainty quantification of these two solvers. We demonstrate that our proposed method can determine the tensor rank automatically and can quantify the uncertainty of the obtained results. We validate our framework on tensor completion tasks and tensorized neural network training tasks. 
    more » « less
  5. RapidLiq is a Windows software program for predicting liquefaction-induced ground failure using geospatial models, which are particularly suited for regional scale applications such as: (i) loss estimation and disaster simulation; (ii) city planning and policy development; (iii) emergency response; and (d) post-event reconnaissance (e.g., to remotely identify sites of interest). RapidLiq v1.0 includes four such models. One is a logistic regression model developed by Rashidian and Baise (2020), which has been adopted into United States Geological Survey (USGS) post-earthquake data products, but which is not often implemented by individuals owing to the geospatial variables that must be compiled. The other three models are machine and deep learning models (ML/DL) proposed by Geyin et al. (2021). These models are driven by algorithmic learning (benefiting from ML/DL insights) but pinned to a physical framework (benefiting from mechanics and the knowledge of regression modelers). While liquefaction is a physical phenomenon best predicted by mechanics, subsurface traits lack theoretical links to above-ground parameters, but correlate in complex, interconnected ways - a prime problem for ML/DL. All four models are described in an acompanying paper manuscript. All necessary predictor variables are compiled within RapidLiq, making user implementation trivial. The only required input is a ground motion raster easily downloaded within minutes of an earthquake, or available for enumerable future earthquake scenarios. This gives the software near-real-time capabilities, such that ground failure can be predicted at regional scale within minutes of an earthquake. The software outputs geotiff files mapping the probabilities of liquefaction-induced ground failure. These files may be viewed within the software or explored in greater detail using GIS or one of many free geotiff web explorers (e.g., http://app.geotiff.io/). The software also allows for tabular input, should a user wish to enter specific sites of interest and ground-motion parameters at those sites, rather than study the regional effects of an earthquake. RapidLiq v.1.0 operates in the contiguous U.S. and completes predictions within 10 seconds for most events. 
    more » « less