NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PropMolFlow: property-guided molecule generation with geometry-complete flow matching

https://doi.org/10.1038/s43588-025-00946-y

Zeng, Cheng; Jin, Jirui; Ambrose, Connor; Karypis, George; Transtrum, Mark; Tadmor, Ellad B; Hennig, Richard G; Roitberg, Adrian; Martiniani, Stefano; Liu, Mingjie (March 2026, Nature Computational Science)

Not Molecule generation is advancing rapidly in chemical discovery and drug design. Flow-matching methods have recently set the state of the art (SOTA) in unconditional molecule generation, surpassing score-based diffusion models. However, diffusion models still lead in property-guided generation. In this work, we introduce PropMolFlow, an approach for property-guided molecule generation based on geometry-complete SE(3)-equivariant flow matching. Integrating five different property embedding methods with a Gaussian expansion of scalar properties, PropMolFlow achieves competitive performance against previous SOTA diffusion models in conditional molecule generation while maintaining high structural stability and validity. Additionally, it enables higher sampling speed with fewer time steps compared with baseline models. We highlight the importance of validating the properties of generated molecules through density functional theory calculations. Furthermore, we introduce a task to assess the model’s ability to propose molecules with under-represented property values, assessing its capacity for out-of-distribution generalization.
more » « less
Full Text Available
All that structure matches does not glitter

Martirossyan, M M; Egg, T; Hollmer, P; Karypis, G; Transtrum, M; Roitberg, A; Liu, M; Hennig, R G; Tadmor, E B; Martiniani, S (September 2025, Proceedings of the The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS))

enerative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends critically on robust benchmarks and minimal, information-rich datasets that enable meaningful model evaluation. This paper critically examines common datasets and reported metrics for a crystal structure prediction task—generating the most likely structures given the chemical composition of a material. We focus on three key issues: First, materials datasets should contain unique crystal structures; for example, we show that the widely-utilized carbon-24 dataset only contains % unique structures. Second, materials datasets should not be split randomly if polymorphs of many different compositions are numerous—which we find to be the case for the perov-5 and MP-20 datasets. Third, benchmarks can mislead if used uncritically, e.g., reporting a match rate metric without considering the structural variety exhibited by identical building blocks. To address these oft-overlooked issues, we introduce several fixes. We provide revised versions of the carbon-24 dataset: one with duplicates removed, one deduplicated and split by number of atoms , one with enantiomorphs, and two containing only identical structures but with different unit cells. We also propose new splits for datasets with polymorphs, ensuring that polymorphs are grouped within each split subset, setting a more sensible standard for benchmarking model performance. Finally, we present METRe and cRMSE, new model evaluation metrics that can correct existing issues with the match rate metric.
more » « less
Full Text Available
Advancing Multiscale Molecular Modeling with Machine Learning-Derived Electrostatics

https://doi.org/10.1021/acs.jctc.4c01792

Semelak, Jonathan A; Pickering, Ignacio; Huddleston, Kate; Olmos, Justo; Grassano, Juan Santiago; Clemente, Camila Mara; Drusin, Salvador I; Marti, Marcelo; Gonzalez_Lebrero, Mariano Camilo; Roitberg, Adrian E; et al (May 2025, Journal of Chemical Theory and Computation)

We introduce an innovative machine learning (ML)-based framework for multiscale molecular modeling in which the ML subsystem is treated as an electrostatic entity interacting with its molecular mechanics (MM) environment through classical electrostatics. The integration of ML accuracy with multiscale modeling is accomplished by leveraging the capabilities of the ANI neural networks to predict geometry-dependent atomic partial charges at the minimal basis iterative stockholder (MBIS) level, going beyond static mechanical embedding. This ML/MM approach can closely approximate state-of-the-art multiscale quantum-classical (QM/MM) methods while significantly lowering computational requirements, thereby facilitating more efficient and precise simulations in computational chemistry. The method requires no additional training beyond the initial model setup and is integrated into Amber, one of the most widely used software suites for molecular modeling, ensuring accessibility to the broader community. We validate its performance across a variety of challenging applications, including the solvation structure, vibrational spectra, torsion free energy profiles, and protein−ligand interactions, achieving excellent agreement with QM/MM benchmarks. This framework not only advances the frontiers of multiscale modeling but also showcases the potential of machine learning to achieve quantum-level accuracy with exceptional efficiency for complex chemical systems.
more » « less
Full Text Available
Open Materials Generation with Stochastic Interpolants

Höllmer, P; Egg, T; Martirossyan, M; Fuemmeler, E; Shui, Z; Gupta, A; Prakash, P; Roitberg, A; Liu, M; Karypis, G; et al (May 2025, Proceedings of the 42nd International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research (PMLR).)

The discovery of new materials is essential for enabling technological advancements. Computational approaches for predicting novel materials must effectively learn the manifold of stable crystal structures within an infinite design space. We introduce Open Materials Generation (OMatG), a unifying framework for the generative design and discovery of inorganic crystalline materials. OMatG employs stochastic interpolants (SI) to bridge an arbitrary base distribution to the target distribution of inorganic crystals via a broad class of tunable stochastic processes, encompassing both diffusion models and flow matching as special cases. In this work, we adapt the SI framework by integrating an equivariant graph representation of crystal structures and extending it to account for periodic boundary conditions in unit cell representations. Additionally, we couple the SI flow over spatial coordinates and lattice vectors with discrete flow matching for atomic species. We benchmark OMatG's performance on two tasks: Crystal Structure Prediction (CSP) for specified compositions, and de novo generation (DNG) aimed at discovering stable, novel, and unique structures. In our ground-up implementation of OMatG, we refine and extend both CSP and DNG metrics compared to previous works. OMatG establishes a new state of the art in generative modeling for materials discovery, outperforming purely flow-based and diffusion-based implementations. These results underscore the importance of designing flexible deep learning frameworks to accelerate progress in materials science. The OMatG code is available at https://github.com/FERMat-ML/OMatG.
more » « less
Full Text Available
LEGOLAS: A Machine Learning Method for Rapid and Accurate Predictions of Protein NMR Chemical Shifts

https://doi.org/10.1021/acs.jctc.5c00026

Darrows, Mikayla Y; Kodituwakku, Dimuthu; Xue, Jinze; Pickering, Ignacio; Terrel, Nicholas S; Roitberg, Adrian E (April 2025, Journal of Chemical Theory and Computation)

This work introduces LEGOLAS, a fully open source TorchANI-based neural network model designed to predict NMR chemical shifts for protein backbone atoms (N, Cα, Cβ, C′, HN, Hα). LEGOLAS has been designed to be fast without loss of accuracy, as our model is able to predict backbone chemical shifts with root-mean-square errors of 2.53 ppm for N, 0.91 ppm for Cα, 1.14 ppm for Cβ, 1.02 ppm for C′, 0.49 ppm for amide protons, and 0.27 ppm for Hα. The program predicts chemical shifts an order of magnitude faster than the widely utilized SHIFTX2 model. This breakthrough allows us to predict NMR chemical shifts for a very large number of input structures, such as frames from a molecular dynamics (MD) trajectory. In our simulation of the protein BBL from Escherichia coli, we observe that averaging the chemical shift predictions for a set of frames of an MD trajectory substantially improves the agreement with experiment with respect to using a single frame of the dynamics. We also show that LEGOLAS can be successfully applied to the problem of recognizing the native states of a protein among a set of decoys.
more » « less
Full Text Available

Search for: All records