skip to main content


Search for: All records

Creators/Authors contains: "Gao, R."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 1, 2025
  2. Human-Robot Collaboration (HRC) aims to create environments where robots can understand workspace dynamics and actively assist humans in operations, with the human intention recognition being fundamental to efficient and safe task fulfillment. Language-based control and communication is a natural and convenient way to convey human intentions. However, traditional language models require instructions to be articulated following a rigid, predefined syntax, which can be unnatural, inefficient, and prone to errors. This paper investigates the reasoning abilities that emerged from the recent advancement of Large Language Models (LLMs) to overcome these limitations, allowing for human instructions to be used to enhance human-robot communication. For this purpose, a generic GPT 3.5 model has been fine-tuned to interpret and translate varied human instructions into essential attributes, such as task relevancy and tools and/or parts required for the task. These attributes are then fused with perceived on-going robot action to generate a sequence of relevant actions. The developed technique is evaluated in a case study where robots initially misinterpreted human actions and picked up wrong tools and parts for assembly. It is shown that the fine-tuned LLM can effectively identify corrective actions across a diverse range of instructional human inputs, thereby enhancing the robustness of human-robot collaborative assembly for smart manufacturing. 
    more » « less
    Free, publicly-accessible full text available May 31, 2025
  3. Free, publicly-accessible full text available May 1, 2025
  4. Prediction of surface topography in milling usually requires complex kinematics and dynamics modeling of the milling process, plus solving physical models of surface generation is a daunting task. This paper presents a multimodal data-driven machine learning (ML) method to predict milled surface topography. The proposed method predicts the height map of the surface topography by fusing process parameters and in-process acoustic information as model inputs. This method has been validated by comparing the predicted surface topography with the measured data. 
    more » « less
    Free, publicly-accessible full text available June 1, 2025
  5. The activity of the grid cell population in the medial entorhinal cortex (MEC) of the mammalian brain forms a vector representation of the self-position of the animal. Recurrent neural networks have been proposed to explain the properties of the grid cells by updating the neural activity vector based on the velocity input of the animal. In doing so, the grid cell system effectively performs path integration. In this paper, we investigate the algebraic, geometric, and topological properties of grid cells using recurrent network models. Algebraically, we study the Lie group and Lie algebra of the recurrent transformation as a representation of self-motion. Geometrically, we study the conformal isometry of the Lie group representation where the local displacement of the activity vector in the neural space is proportional to the local displacement of the agent in the 2D physical space. Topologically, the compact abelian Lie group representation automatically leads to the torus topology commonly assumed and observed in neuroscience. We then focus on a simple non-linear recurrent model that underlies the continuous attractor neural networks of grid cells. Our numerical experiments show that conformal isometry leads to hexagon periodic patterns in the grid cell responses and our model is capable of accurate path integration. 
    more » « less
  6. With funding from a National Science Foundation (NSF) IUSE/PFE: Revolutionizing Engineering and Computer Science Departments (IUSE/PFE: RED) grant, our vision is to focus on faculty development and culture change to reduce the effort and risk experienced by faculty in implementing pedagogical changes and to increase iterative, data-driven changes in teaching. Our project, called Teams for Creating Opportunities for Revolutionizing the Preparation of Students (TCORPS), is an adaptation of the “Additive innovation” model proposed by Arizona State University [1]. The Department of Mechanical Engineering at Texas A&M University has a long legacy of individualistic and---in many cases---a fixed mindset [2] approach to teaching with the expectation of top-down management of change. The goal of our project is to evolve the departmental culture to a bottom-up team structure where the faculty embrace an innovative mindset and extend an iterative build-test-learn method of the maker culture [3] that was formalized by the Lean Startup [4] approach. Faculty already have investigative and experimentation-driven processes in place for research and a keen understanding of data to support their hypotheses. We aim to leverage this preexisting strength and knowledge by extending it to the faculty-led, small-scale, iterative improvement of curriculum and pedagogy 
    more » « less
  7. This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Thus the vector representation is equivariant as it varies according to the local displacements. Our experiments show that our model can learn Gabor-like filter pairs of quadrature phases. The profiles of the learned filters match those of simple cells in Macaque V1. Moreover, we demonstrate that the model can learn to infer local motions in either a supervised or unsupervised manner. With such a simple model, we achieve competitive results on optical flow estimation. 
    more » « less
  8. Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this paper, we propose to learn EBM with a flow-based model (or in general latent variable model) serving as a backbone, so that the EBM is a correction or an exponential tilting of the flow-based model. We show that the model has a particularly simple form in the space of the latent variables of the generative model, and MCMC sampling of the EBM in the latent space mixes well and traverses modes in the data space. This enables proper sampling and learning of EBMs. 
    more » « less
  9. Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts. 
    more » « less