skip to main content


Search for: All records

Award ID contains: 1841520

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Motivation

    Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability.

    Results

    Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks.

    Availability and implementation

    Training and generated data are made available at https://ieee-dataport.org/documents/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https://anonymous.4open.science/r/D-MolVAE-2799/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract

    Previous research has noted that many factors greatly influence the spread of COVID‐19. Contrary to explicit factors that are measurable, such as population density, number of medical staff, and the daily test rate, many factors are not directly observable, for instance, culture differences and attitudes toward the disease, which may introduce unobserved heterogeneity. Most contemporary COVID‐19 related research has focused on modeling the relationship between explicitly measurable factors and the response variable of interest (such as the infection rate or the death rate). The infection rate is a commonly used metric for evaluating disease progression and a state's mitigation efforts. Because unobservable sources of heterogeneity cannot be measured directly, it is hard to incorporate them into the quantitative assessment and decision‐making process. In this study, we propose new metrics to study a state's performance by adjusting the measurable county‐level covariates and unobservable state‐level heterogeneity through random effects. A hierarchical linear model (HLM) is postulated, and we calculate two model‐based metrics—the standardized infection ratio (SDIR) and the adjusted infection rate (AIR). This analysis highlights certain time periods when the infection rate for a state was high while their SDIR was low and vice versa. We show that trends in these metrics can give insight into certain aspects of a state's performance. As each state continues to develop their individualized COVID‐19 mitigation strategy and ultimately works to improve their performance, the SDIR and AIR may help supplement the crude infection rate metric to provide a more thorough understanding of a state's performance.

     
    more » « less
  3. Deep learning’s performance has been extensively recognized recently. Graph neural networks (GNNs) are designed to deal with graph-structural data that classical deep learning does not easily manage. Since most GNNs were created using distinct theories, direct comparisons are impossible. Prior research has primarily concentrated on categorizing existing models, with little attention paid to their intrinsic connections. The purpose of this study is to establish a unified framework that integrates GNNs based on spectral graph and approximation theory. The framework incorporates a strong integration between spatial- and spectral-based GNNs while tightly associating approaches that exist within each respective domain.

     
    more » « less
    Free, publicly-accessible full text available May 31, 2025
  4. With recent advancements, large language models (LLMs) such as ChatGPT and Bard have shown the potential to disrupt many industries, from customer service to healthcare. Traditionally, humans interact with geospatial data through software (e.g., ArcGIS 10.3) and programming languages (e.g., Python). As a pioneer study, we explore the possibility of using an LLM as an interface to interact with geospatial datasets through natural language. To achieve this, we also propose a framework to (1) train an LLM to understand the datasets, (2) generate geospatial SQL queries based on a natural language question, (3) send the SQL query to the backend database, (4) parse the database response back to human language. As a proof of concept, a case study was conducted on real-world data to evaluate its performance on various queries. The results show that LLMs can be accurate in generating SQL code for most cases, including spatial joins, although there is still room for improvement. As all geospatial data can be stored in a spatial database, we hope that this framework can serve as a proxy to improve the efficiency of spatial data analyses and unlock the possibility of automated geospatial analytics.

     
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  5. Free, publicly-accessible full text available October 2, 2024
  6. Free, publicly-accessible full text available October 2, 2024
  7. Free, publicly-accessible full text available July 16, 2024
  8. Rapid Intensification (RI) in Tropical Cyclone (TC) development is one of the most difficult and still challenging tasks in weather forecasting. In addition to the dynamical numerical simulations, commonly used techniques for RI (as well as TC intensity changes) analysis and prediction are the composite analysis and statistical models based on features derived from the composite analysis. Quite a large number of such selected and pre-determined features related to TC intensity change and RI have been accumulated by the domain scientists, such as those in the widely used SHIPS (Statistical Hurricane Intensity Prediction Scheme) database. Moreover, new features are still being added with new algorithms and/or newly available datasets. However, there are very few unified frameworks for systematically distilling features from a comprehensive data source. One such unified Artificial Intelligence (AI) system was developed for deriving features from TC centers, and here, we expand that system to large-scale environmental condition. In this study, we implemented a deep learning algorithm, the Convolutional Neural Network (CNN), to the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis data and identified and refined potentially new features relevant to RI such as specific humidity in east or northeast, vorticity and horizontal wind in north and south relative to the TC centers, as well as ozone at high altitudes that could help the prediction and understanding of the occurrence of RI based on the deep learning network (named TCNET in this study). By combining the newly derived features and the features from the SHIPS database, the RI prediction performance can be improved by 43%, 23%, and 30% in terms of Kappa, probability of detection (POD), and false alarm rate (FAR) against the same modern classification model but with the SHIPS inputs only. 
    more » « less