De novo design of molecules with targeted properties represents a new frontier in molecule development. Despite enormous progress, two main challenges remain: (i) generating novel molecules conditioned on targeted, continuous property values; (ii) obtaining molecules with property values beyond the range in the training data. To tackle these challenges, we propose a reinforced regressional and conditional generative adversarial network (RRCGAN) to generate chemically valid molecules with targeted HOMO–LUMO energy gap (ΔEH–L) as a proof-of-concept study. As validated by density functional theory (DFT) calculation, 75% of the generated molecules have a relative error (RE) of <20% of the targeted ΔEH–L values. To bias the generation toward the ΔEH–L values beyond the range of the original training molecules, transfer learning was applied to iteratively retrain the RRCGAN model. After just two iterations, the mean ΔEH–L of the generated molecules increases to 8.7 eV from the mean value of 5.9 eV shown in the initial training dataset. Qualitative and quantitative analyses reveal that the model has successfully captured the underlying structure–property relationship, which agrees well with the established physical and chemical rules. These results present a trustworthy, purely data-driven methodology for the highly efficient generation of novel molecules with different targeted properties. 
                        more » 
                        « less   
                    This content will become publicly available on September 15, 2026
                            
                            Dreaming Up Novel Quantum Dyes using Inverse Machine Learning in MatFlow
                        
                    
    
            Discovering novel molecules with targeted properties remains a formidable challenge in materials science, often likened to finding a needle in a haystack. Traditional experimental approaches are slow, costly, and inefficient. In this study, we present an inverse design framework based on a molecular graph conditional variational autoencoder (CVAE) that enables the generation of new molecules with user-specified optical properties, particularly molar extinction coefficient ($$\varepsilon$$). Our model encodes molecular graphs, derived from SMILES strings, into a structured latent space, and then decodes them into valid molecular structures conditioned on a target $$\varepsilon$$ value. Trained on a curated dataset of known molecules with corresponding extinction coefficients, the CVAE learns to generate chemically valid structures, as verified by RDKit. Subsequent Density Functional Theory (DFT) simulations confirm that many of the generated molecules exhibit the electronic structures similar to those molecules with desired $$\varepsilon$$ values. We have also verified the $$\varepsilon$$ values of the generated molecules using a graph neural network (GNN) and the synthesizability of those molecules using an open-source module named ASKCOS. This approach demonstrates the potential of CVAEs to accelerate molecular discovery by enabling user-guided, property-driven molecule generation -- offering a scalable, data-driven alternative to traditional trial-and-error synthesis. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2410668
- PAR ID:
- 10631954
- Publisher / Repository:
- IEEE
- Date Published:
- Format(s):
- Medium: X
- Location:
- Chicago USA
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Generative deep learning methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a denoising diffusion framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively. Importantly, we demonstrate that GCDM’s generative denoising process enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Code and data are freely available onGitHub.more » « less
- 
            ABSTRACT The discovery of novel thermoset shape memory polymers (TSMPs) for additive manufacturing can be accelerated through the use of a deep‐generative algorithm, minimizing the need for laborious traditional laboratory experiments. This study is the first to introduce an innovative approach that uses a deep generative learning model, namely the conditional variational autoencoder (CVAE), to discover novel TSMPs with lower glass transition temperature () and high recovery stress values (). In this study, specific chemical groups, such as epoxy, amine, thiol, and vinyl, are integrated as constraints to generate novel TSMPs while preserving the essential reaction properties. To address the challenges posed by a small dataset, the CVAE model is used with graph‐extracted features. Unlike previous studies focused on single‐polymer systems, this research extends to two‐monomer samples, discovering 22 novel TSMPs. This approach has practical implications in additive manufacturing, biomedical devices, aerospace, and robotics for the discovery of novel samples from limited data.more » « less
- 
            The discovery of functional dye materials with superior optical properties is crucial for advancing technologies in biomedical imaging, organic photovoltaics, and quantum information systems. Recent advancements highlight the need to accelerate this discovery process by integrating computational strategies with experimental methods. In this regard, we have employed a computational approach to explore the latent space of dye materials, utilizing swarm optimization techniques to efficiently navigate complex chemical spaces and identify optimal values of molecular properties using machine learning methods based on target properties, such as high extinction coefficients ($$\varepsilon$$). The latent space based evaluation outperformed all available features of a domain. This approach enhances inverse material design by systematically correlating molecular parameters with desired optical characteristics by implementing VAEs. In this process, by defining target properties as inputs, the model effectively determines the key molecular features necessary for engineering high-performance dye compounds.more » « less
- 
            Procedural modeling has produced amazing results, yet fundamental issues such as controllability and limited user guidance persist. We introduce a novel procedural system called PICO (Procedural Iterative Constrained Optimizer) using PICO-Graph, a procedural model designed with optimization in mind. PICO enables the exploration of generative designs by combining user and environmental constraints into a single framework and using optimization without the need to write procedural rules. The PICO-Graph is a data-flow procedural model consisting of a set of geometry-generating operation nodes. The forward generation is initiated by sending geometric objects from initial nodes. These objects travel through the graph, triggering generation of more objects along the way. We combine the PICO-Graph with evolutionary optimization that allows for exploration of the generated models and the generation of variants. The user defines the geometry-generating operations and the set of constraints; e.g, whether an existing object should be supported by the generated model, whether symmetries exist, etc. PICO then generates geometric models that fulfill the constraints through optimization, allowing interactive user control of constraints. We show PICO on a variety of examples, including generation of procedural chairs, generation of support structures for 3D printing, or generation of procedural terrains matching a given input.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
