skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: V-Mail: 3D-Enabled Correspondence About Spatial Data on (Almost) All Your Devices
We present V-Mail, a framework of cross-platform applications, interactive techniques, and communication protocols for improved multi-person correspondence about spatial 3D datasets. Inspired by the daily use of e-mail, V-Mail seeks to enable a similar style of rapid, multi-person communication accessible on any device; however, it aims to do this in the new context of spatial 3D communication, where limited access to 3D graphics hardware typically prevents such communication. The approach integrates visual data storytelling with data exploration, spatial annotations, and animated transitions. V-Mail “data stories” are exported in a standard video file format to establish a common baseline level of access on (almost) any device. The V-Mail framework also includes a series of complementary client applications and plugins that enable different degrees of story co-authoring and data exploration, adjusted automatically to match the capabilities of various devices. A lightweight, phone-based V-Mail app makes it possible to annotate data by adding captions to the video. These spatial annotations are then immediately accessible to team members running high-end 3D graphics visualization systems that also include a V-Mail client, implemented as a plugin. Results and evaluation from applying V-Mail to assist communication within an interdisciplinary science team studying Antarctic ice sheets confirm the utility of the asynchronous, cross-platform collaborative framework while also highlighting some current limitations and opportunities for future work.  more » « less
Award ID(s):
1704604
PAR ID:
10541278
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Visualization and Computer Graphics
Volume:
30
Issue:
4
ISSN:
1077-2626
Page Range / eLocation ID:
1853 to 1867
Subject(s) / Keyword(s):
Data visualization Three-dimensional displays Annotations Spatial databases Collaboration Software Task analysis
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Successful collaboration requires team members to stay aligned, especially in complex sequential tasks. Team members must dynamically coordinate which subtasks to perform and in what order. However, real-world constraints like partial observability and limited communication bandwidth often lead to suboptimal collaboration. Even among expert teams, the same task can be executed in multiple ways. To develop multi-agent systems and human-AI teams for such tasks, we are interested in data-driven learning of multimodal team behaviors. Multi-Agent Imitation Learning (MAIL) provides a promising framework for data-driven learning of team behavior from demonstrations, but existing methods struggle with heterogeneous demonstrations, as they assume that all demonstrations originate from a single team policy. Hence, in this work, we introduce DTIL: a hierarchical MAIL algorithm designed to learn multimodal team behaviors in complex sequential tasks. DTIL represents each team member with a hierarchical policy and learns these policies from heterogeneous team demonstrations in a factored manner. By employing a distribution-matching approach, DTIL mitigates compounding errors and scales effectively to long horizons and continuous state representations. Experimental results show that DTIL outperforms MAIL baselines and accurately models team behavior across a variety of collaborative scenarios. 
    more » « less
  2. Egocentric and exocentric perspectives of human action differ significantly, yet overcoming this extreme viewpoint gap is critical in augmented reality and robotics. We propose VIEWPOINTROSETTA, an approach that unlocks large-scale unpaired ego and exo video data to learn clip-level viewpoint-invariant video representations. Our framework introduces (1) a diffusion-based Rosetta Stone Translator (RST), which, leveraging a moderate amount of synchronized multi-view videos, serves as a translator in feature space to decipher the alignment between unpaired ego and exo data, and (2) a dual encoder that aligns unpaired data representations through contrastive learning with RST-based synthetic feature augmentation and soft alignment. To evaluate the learned features in a standardized setting, we construct a new cross-view benchmark using Ego-Exo4D, covering cross-view retrieval, action recognition, and skill assessment tasks. Our framework demonstrates superior cross-view understanding compared to previous view-invariant learning and ego video representation learning approaches, and opens the door to bringing vast amounts of traditional third-person video to bear on the more nascent first-person setting. 
    more » « less
  3. In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life representations. Bridging these gaps, we introduce DualStream, a system for synchronous mobile AR remote communication that captures, streams, and displays spatial representations of users and their surroundings. DualStream supports transitions between user and environment representations with different levels of visuospatial fidelity, as well as the creation of persistent shared spaces using environment snapshots. We demonstrate how DualStream can enable spatial communication in real-world contexts, and support the creation of blended spaces for collaboration. A formative evaluation of DualStream revealed that users valued the ability to interact spatially and move between representations, and could see DualStream fitting into their own remote communication practices in the near future. Drawing from these findings, we discuss new opportunities for designing more widely accessible spatial communication tools, centered around the mobile phone. 
    more » « less
  4. Oblivious Random Access Machine (ORAM) allows a client to hide the access pattern when accessing sensitive data on a remote server. It is known that there exists a logarithmic communication lower bound on any passive ORAM construction, where the server only acts as the storage service. This overhead, however, was shown costly for some applications. Several active ORAM schemes with server computation have been proposed to overcome this limitation. However, they mostly rely on costly homomorphic encryptions, whose performance is worse than passive ORAM. In this article, we propose S3ORAM, a new multi-server ORAM framework, which featuresO(1) client bandwidth blowup and low client storage without relying on costly cryptographic primitives. Our key idea is to harness Shamir Secret Sharing and a multi-party multiplication protocol on applicable binary tree-ORAM paradigms. This strategy allows the client to instruct the server(s) to perform secure and efficient computation on his/her behalf with a low intervention thereby, achieving a constant client bandwidth blowup and low server computational overhead. Our framework can also work atop a generalk-ary tree ORAM structure (k≥ 2). We fully implemented our framework, and strictly evaluated its performance on a commodity cloud platform (Amazon EC2). Our comprehensive experiments confirmed the efficiency of S3ORAM framework, where it is approximately 10× faster than the most efficient passive ORAM (i.e., Path-ORAM) for a moderate network bandwidth while being three orders of magnitude faster than active ORAM withO(1) bandwidth blowup (i.e., Onion-ORAM). We have open-sourced the implementation of our framework for public testing and adaptation. 
    more » « less
  5. null (Ed.)
    Human activity recognition (HAR) from wearable sensors data has become ubiquitous due to the widespread proliferation of IoT and wearable devices. However, recognizing human activity in heterogeneous environments, for example, with sensors of different models and make, across different persons and their on-body sensor placements introduces wide range discrepancies in the data distributions, and therefore, leads to an increased error margin. Transductive transfer learning techniques such as domain adaptation have been quite successful in mitigating the domain discrepancies between the source and target domain distributions without the costly target domain data annotations. However, little exploration has been done when multiple distinct source domains are present, and the optimum mapping to the target domain from each source is not apparent. In this paper, we propose a deep Multi-Source Adversarial Domain Adaptation (MSADA) framework that opportunistically helps select the most relevant feature representations from multiple source domains and establish such mappings to the target domain by learning the perplexity scores. We showcase that the learned mappings can actually reflect our prior knowledge on the semantic relationships between the domains, indicating that MSADA can be employed as a powerful tool for exploratory activity data analysis. We empirically demonstrate that our proposed multi-source domain adaptation approach achieves 2% improvement with OPPORTUNITY dataset (cross-person heterogeneity, 4 ADLs), whereas 13% improvement on DSADS dataset (cross-position heterogeneity, 10 ADLs and sports activities). 
    more » « less