skip to main content


Search for: All records

Creators/Authors contains: "Gu, Y."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This work in progress paper presents and motivates the design of a novel extended reality (XR) environment for artificial intelligence (AI) education, and presents its first implementation. The learner is seated at a table and wears an XR headset that allows them to see both the real world and a visualization of a neural network. The visualization is adjustable. The learner can inspect each layer, each neuron, and each connection. The learner can also choose a different input image, or create their own image to feed to the network. The inference is computed on the headset, in real time. The neural network configuration and its weights are loaded from an onnx file, which supports a variety of architectures as well as changing the weights to illustrate the training process. 
    more » « less
    Free, publicly-accessible full text available March 21, 2025
  2. Abstract The High Luminosity upgrade of the Large Hadron Collider (HL-LHC) will produce particle collisions with up to 200 simultaneous proton-proton interactions. These unprecedented conditions will create a combinatorial complexity for charged-particle track reconstruction that demands a computational cost that is expected to surpass the projected computing budget using conventional CPUs. Motivated by this and taking into account the prevalence of heterogeneous computing in cutting-edge High Performance Computing centers, we propose an efficient, fast and highly parallelizable bottom-up approach to track reconstruction for the HL-LHC, along with an associated implementation on GPUs, in the context of the Phase 2 CMS outer tracker. Our algorithm, called Segment Linking (or Line Segment Tracking), takes advantage of localized track stub creation, combining individual stubs to progressively form higher level objects that are subject to kinematical and geometrical requirements compatible with genuine physics tracks. The local nature of the algorithm makes it ideal for parallelization under the Single Instruction, Multiple Data paradigm, as hundreds of objects can be built simultaneously. The computing and physics performance of the algorithm has been tested on an NVIDIA Tesla V100 GPU, already yielding efficiency and timing measurements that are on par with the latest, multi-CPU versions of existing CMS tracking algorithms. 
    more » « less
  3. The way an object looks and sounds provide complementary reflections of its physical properties. In many settings cues from vision and audition arrive asynchronously but must be integrated, as when we hear an object dropped on the floor and then must find it. In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. An object is dropped somewhere in a room. An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics. To study this problem, we have generated a large-scale dataset -- the Fallen Objects dataset -- that includes 8000 instances of 30 physical object categories in 64 rooms. The dataset uses the ThreeDWorld Platform that can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting. As a first step toward addressing this challenge, we develop a set of embodied agent baselines, based on imitation learning, reinforcement learning, and modular planning, and perform an in-depth analysis of the challenge of this new task. 
    more » « less
  4. Rafferty, A. ; Whitehall, J. ; Cristobal, R. ; Cavalli-Sforza, V. (Ed.)
    We propose VarFA, a variational inference factor analysis framework that extends existing factor analysis models for educational data mining to efficiently output uncertainty estimation in the model's estimated factors. Such uncertainty information is useful, for example, for an adaptive testing scenario, where additional tests can be administered if the model is not quite certain about a students' skill level estimation. Traditional Bayesian inference methods that produce such uncertainty information are computationally expensive and do not scale to large data sets. VarFA utilizes variational inference which makes it possible to efficiently perform Bayesian inference even on very large data sets. We use the sparse factor analysis model as a case study and demonstrate the efficacy of VarFA on both synthetic and real data sets. VarFA is also very general and can be applied to a wide array of factor analysis models. 
    more » « less
  5. null (Ed.)
    Devices that facilitate nonverbal communication typically require high computational loads or have rigid and bulky form factors that are unsuitable for use on the face or on other curvilinear body surfaces. Here, we report the design and pilot testing of an integrated system for decoding facial strains and for predicting facial kinematics. The system consists of mass-manufacturable, conformable piezoelectric thin films for strain mapping; multiphysics modelling for analysing the nonlinear mechanical interactions between the conformable device and the epidermis; and three-dimensional digital image correlation for reconstructing soft-tissue surfaces under dynamic deformations as well as for informing device design and placement. In healthy individuals and in patients with amyotrophic lateral sclerosis, we show that the piezoelectric thin films, coupled with algorithms for the real-time detection and classification of distinct skin-deformation signatures, enable the reliable decoding of facial movements. The integrated system could be adapted for use in clinical settings as a nonverbal communication technology or for use in the monitoring of neuromuscular conditions. 
    more » « less
  6. Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https://github.com/Luodian/MADAN. 
    more » « less