skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hong, Pengyu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 26, 2025
  2. Enhancing accurate molecular property predic- tion relies on effective and proficient representa- tion learning. It is crucial to incorporate diverse molecular relationships characterized by multi- similarity (self-similarity and relative similarities) (Wang et al., 2019) between molecules. However, current molecular representation learning meth- ods fall short in exploring multi-similarity and of- ten underestimate the complexity of relationships between molecules. Additionally, previous multi- similarity approaches require the specification of positive and negative pairs to attribute distinct pre- defined weights to different relative similarities, which can introduce potential bias. In this work, we introduce Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework, along with a novel approach to for- mulate a generalized multi-similarity metric with- out the need to define positive and negative pairs. In each of the chemical modality spaces (e.g., molecular depiction image, fingerprint, NMR, and SMILES) under consideration, we first de- fine a self-similarity metric (i.e., similarity be- tween an anchor molecule and another molecule), and then transform it into a generalized multi- similarity metric for the anchor through a pair weighting function. GraphMSL validates the effi- cacy of the multi-similarity metric across Molecu- leNet datasets. Furthermore, these metrics of all modalities are integrated into a multimodal multi-similarity metric, which showcases the po- tential to improve the performance. Moreover, the focus of the model can be redirected or cus- tomized by altering the fusion function. Last but not least, GraphMSL proves effective in drug dis- covery evaluations through post-hoc analyses of the learnt representations. 
    more » « less
    Free, publicly-accessible full text available July 26, 2025
  3. Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, iso- mer recognition, and peak assignment. In response, this paper introduces a novel solution, Knowledge-Guided Multi-Level Multimodal Alignment with Instance-Wise Discrimination (K-M3 AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K- M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge- guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3 AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores the effectiveness of K-M3AID in multiple zero- shot tasks. 
    more » « less
    Free, publicly-accessible full text available July 26, 2025
  4. A machine learning model for reliable director fields calculation from raw experimental images of active nematics. The model is accurate, robust to noise and generalizable, enhancing analysis such as the detection and tracking of topological defects. 
    more » « less
  5. Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective. 
    more » « less
  6. Were astronauts forced to land on the surface of Mars using manual control of their vehicle, they would not have familiar gravitational cues because Mars’ gravity is only 0.38 g. They could become susceptible to spatial disorientation, potentially causing mission ending crashes. In our earlier studies, we secured blindfolded participants into a Multi-Axis Rotation System (MARS) device that was programmed to behave like an inverted pendulum. Participants used a joystick to stabilize around the balance point. We created a spaceflight analog condition by having participants dynamically balance in the horizontal roll plane, where they did not tilt relative to the gravitational vertical and therefore could not use gravitational cues to determine their position. We found 90% of participants in our spaceflight analog condition reported spatial disorientation and all of them showed it in their data. There was a high rate of crashing into boundaries that were set at ± 60 ° from the balance point. Our goal was to see whether we could use deep learning to predict the occurrence of crashes before they happened. We used stacked gated recurrent units (GRU) to predict crash events 800 ms in advance with an AUC (area under the curve) value of 99%. When we prioritized reducing false negatives we found it resulted in more false positives. We found that false negatives occurred when participants made destabilizing joystick deflections that rapidly moved the MARS away from the balance point. These unpredictable destabilizing joystick deflections, which occurred in the duration of time after the input data, are likely a result of spatial disorientation. If our model could work in real time, we calculated that immediate human action would result in the prevention of 80.7% of crashes, however, if we accounted for human reaction times (∼400 ms), only 30.3% of crashes could be prevented, suggesting that one solution could be an AI taking temporary control of the spacecraft during these moments. 
    more » « less