skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2311632

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Not Molecule generation is advancing rapidly in chemical discovery and drug design. Flow-matching methods have recently set the state of the art (SOTA) in unconditional molecule generation, surpassing score-based diffusion models. However, diffusion models still lead in property-guided generation. In this work, we introduce PropMolFlow, an approach for property-guided molecule generation based on geometry-complete SE(3)-equivariant flow matching. Integrating five different property embedding methods with a Gaussian expansion of scalar properties, PropMolFlow achieves competitive performance against previous SOTA diffusion models in conditional molecule generation while maintaining high structural stability and validity. Additionally, it enables higher sampling speed with fewer time steps compared with baseline models. We highlight the importance of validating the properties of generated molecules through density functional theory calculations. Furthermore, we introduce a task to assess the model’s ability to propose molecules with under-represented property values, assessing its capacity for out-of-distribution generalization. 
    more » « less
  2. enerative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends critically on robust benchmarks and minimal, information-rich datasets that enable meaningful model evaluation. This paper critically examines common datasets and reported metrics for a crystal structure prediction task—generating the most likely structures given the chemical composition of a material. We focus on three key issues: First, materials datasets should contain unique crystal structures; for example, we show that the widely-utilized carbon-24 dataset only contains % unique structures. Second, materials datasets should not be split randomly if polymorphs of many different compositions are numerous—which we find to be the case for the perov-5 and MP-20 datasets. Third, benchmarks can mislead if used uncritically, e.g., reporting a match rate metric without considering the structural variety exhibited by identical building blocks. To address these oft-overlooked issues, we introduce several fixes. We provide revised versions of the carbon-24 dataset: one with duplicates removed, one deduplicated and split by number of atoms , one with enantiomorphs, and two containing only identical structures but with different unit cells. We also propose new splits for datasets with polymorphs, ensuring that polymorphs are grouped within each split subset, setting a more sensible standard for benchmarking model performance. Finally, we present METRe and cRMSE, new model evaluation metrics that can correct existing issues with the match rate metric. 
    more » « less
  3. We introduce an innovative machine learning (ML)-based framework for multiscale molecular modeling in which the ML subsystem is treated as an electrostatic entity interacting with its molecular mechanics (MM) environment through classical electrostatics. The integration of ML accuracy with multiscale modeling is accomplished by leveraging the capabilities of the ANI neural networks to predict geometry-dependent atomic partial charges at the minimal basis iterative stockholder (MBIS) level, going beyond static mechanical embedding. This ML/MM approach can closely approximate state-of-the-art multiscale quantum-classical (QM/MM) methods while significantly lowering computational requirements, thereby facilitating more efficient and precise simulations in computational chemistry. The method requires no additional training beyond the initial model setup and is integrated into Amber, one of the most widely used software suites for molecular modeling, ensuring accessibility to the broader community. We validate its performance across a variety of challenging applications, including the solvation structure, vibrational spectra, torsion free energy profiles, and protein−ligand interactions, achieving excellent agreement with QM/MM benchmarks. This framework not only advances the frontiers of multiscale modeling but also showcases the potential of machine learning to achieve quantum-level accuracy with exceptional efficiency for complex chemical systems. 
    more » « less
  4. The discovery of new materials is essential for enabling technological advancements. Computational approaches for predicting novel materials must effectively learn the manifold of stable crystal structures within an infinite design space. We introduce Open Materials Generation (OMatG), a unifying framework for the generative design and discovery of inorganic crystalline materials. OMatG employs stochastic interpolants (SI) to bridge an arbitrary base distribution to the target distribution of inorganic crystals via a broad class of tunable stochastic processes, encompassing both diffusion models and flow matching as special cases. In this work, we adapt the SI framework by integrating an equivariant graph representation of crystal structures and extending it to account for periodic boundary conditions in unit cell representations. Additionally, we couple the SI flow over spatial coordinates and lattice vectors with discrete flow matching for atomic species. We benchmark OMatG's performance on two tasks: Crystal Structure Prediction (CSP) for specified compositions, and de novo generation (DNG) aimed at discovering stable, novel, and unique structures. In our ground-up implementation of OMatG, we refine and extend both CSP and DNG metrics compared to previous works. OMatG establishes a new state of the art in generative modeling for materials discovery, outperforming purely flow-based and diffusion-based implementations. These results underscore the importance of designing flexible deep learning frameworks to accelerate progress in materials science. The OMatG code is available at https://github.com/FERMat-ML/OMatG. 
    more » « less
  5. This work introduces LEGOLAS, a fully open source TorchANI-based neural network model designed to predict NMR chemical shifts for protein backbone atoms (N, Cα, Cβ, C′, HN, Hα). LEGOLAS has been designed to be fast without loss of accuracy, as our model is able to predict backbone chemical shifts with root-mean-square errors of 2.53 ppm for N, 0.91 ppm for Cα, 1.14 ppm for Cβ, 1.02 ppm for C′, 0.49 ppm for amide protons, and 0.27 ppm for Hα. The program predicts chemical shifts an order of magnitude faster than the widely utilized SHIFTX2 model. This breakthrough allows us to predict NMR chemical shifts for a very large number of input structures, such as frames from a molecular dynamics (MD) trajectory. In our simulation of the protein BBL from Escherichia coli, we observe that averaging the chemical shift predictions for a set of frames of an MD trajectory substantially improves the agreement with experiment with respect to using a single frame of the dynamics. We also show that LEGOLAS can be successfully applied to the problem of recognizing the native states of a protein among a set of decoys. 
    more » « less