The discovery of molecules with optimal functional properties is a central challenge across diverse fields such as energy storage, catalysis, and chemical sensing. However, molecular property optimization (MPO) remains difficult due to the combinatorial size of chemical space and the cost of acquiring property labels via simulations or wet-lab experiments. Bayesian optimization (BO) offers a principled framework for sample-efficient discovery in such settings, but its effectiveness depends critically on the quality of the molecular representation used to train the underlying probabilistic surrogate model. Existing approaches based on fingerprints, graphs, SMILES strings, or learned embeddings often struggle in low-data regimes due to high dimensionality or poorly structured latent spaces. Here, we introduce Molecular Descriptors with Actively Identified Subspaces (MolDAIS), a flexible molecular BO framework that adaptively identifies task-relevant subspaces within large descriptor libraries. Leveraging the sparse axis-aligned subspace (SAAS) prior introduced in recent BO literature, MolDAIS constructs parsimonious Gaussian process surrogate models that focus on task-relevant features as new data is acquired. In addition to validating this approach for descriptor-based MPO, we introduce two novel screening variants, which significantly reduce computational cost while preserving predictive accuracy and physical interpretability. We demonstrate that MolDAIS consistently outperforms state-of-the-art MPO methods across a suite of benchmark and real-world tasks, including single- and multi-objective optimization. Our results show that MolDAIS can identify near-optimal candidates from chemical libraries with over 100,000 molecules using fewer than 100 property evaluations, highlighting its promise as a practical tool for data-scarce molecular discovery.
more »
« less
This content will become publicly available on June 23, 2026
Potency of Latent Spaces in Inverse Quantum Dye Design
The discovery of functional dye materials with superior optical properties is crucial for advancing technologies in biomedical imaging, organic photovoltaics, and quantum information systems. Recent advancements highlight the need to accelerate this discovery process by integrating computational strategies with experimental methods. In this regard, we have employed a computational approach to explore the latent space of dye materials, utilizing swarm optimization techniques to efficiently navigate complex chemical spaces and identify optimal values of molecular properties using machine learning methods based on target properties, such as high extinction coefficients ($$\varepsilon$$). The latent space based evaluation outperformed all available features of a domain. This approach enhances inverse material design by systematically correlating molecular parameters with desired optical characteristics by implementing VAEs. In this process, by defining target properties as inputs, the model effectively determines the key molecular features necessary for engineering high-performance dye compounds.
more »
« less
- PAR ID:
- 10631942
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400714627
- Page Range / eLocation ID:
- 1 to 7
- Format(s):
- Medium: X
- Location:
- Columbus USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Discovering novel molecules with targeted properties remains a formidable challenge in materials science, often likened to finding a needle in a haystack. Traditional experimental approaches are slow, costly, and inefficient. In this study, we present an inverse design framework based on a molecular graph conditional variational autoencoder (CVAE) that enables the generation of new molecules with user-specified optical properties, particularly molar extinction coefficient ($$\varepsilon$$). Our model encodes molecular graphs, derived from SMILES strings, into a structured latent space, and then decodes them into valid molecular structures conditioned on a target $$\varepsilon$$ value. Trained on a curated dataset of known molecules with corresponding extinction coefficients, the CVAE learns to generate chemically valid structures, as verified by RDKit. Subsequent Density Functional Theory (DFT) simulations confirm that many of the generated molecules exhibit the electronic structures similar to those molecules with desired $$\varepsilon$$ values. We have also verified the $$\varepsilon$$ values of the generated molecules using a graph neural network (GNN) and the synthesizability of those molecules using an open-source module named ASKCOS. This approach demonstrates the potential of CVAEs to accelerate molecular discovery by enabling user-guided, property-driven molecule generation -- offering a scalable, data-driven alternative to traditional trial-and-error synthesis.more » « less
-
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery. Both ligand and target molecules are represented as graphs with node and edge features encoding information about atomic elements and bonds respectively. Although existing deep learning models perform remarkably well at predicting physicochemical properties and binding affinities, the generation of new molecules with optimized properties remains challenging. Inherently, most GNNs perform poorly in whole-graph representation due to the limitations of the message-passing paradigm. Furthermore, step-by-step graph generation frameworks that use reinforcement learning or other sequential processing can be slow and result in a high proportion of invalid molecules with substantial post-processing needed in order to satisfy the principles of stoichiometry. To address these issues, we propose a representation-first approach to molecular graph generation. We guide the latent representation of an autoencoder by capturing graph structure information with the geometric scattering transform and apply penalties that structure the representation also by molecular properties. We show that this highly structured latent space can be directly used for molecular graph generation by the use of a GAN. We demonstrate that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.more » « less
-
Generating molecular structures with desired properties is a critical task with broad applications in drug discovery and materials design. We propose 3M-Diffusion, a novel multi-modal molecular graph generation method, to generate diverse, ideally novel molecular structures with desired properties. 3M-Diffusion encodes molecular graphs into a graph latent space which it then aligns with the text space learned by encoder based LLMs from textual descriptions. It then reconstructs the molecular structure and atomic attributes based on the given text descriptions using the molecule decoder. It then learns a probabilistic mapping from the text space to the latent molecular graph space using a diffusion model. The results of our extensive experiments on several datasets demonstrate that 3M-Diffusion can generate high-quality, novel and diverse molecular graphs that semantically match the textual description provided. The code is available on github.more » « less
-
This review provides an overview of the fabrication methods for Ti3C2Tx MXene-based hybrid photocatalysts and evaluates their role in degrading organic dye pollutants. Ti3C2Tx MXene has emerged as a promising material for hybrid photocatalysts due to its high metallic conductivity, excellent hydrophilicity, strong molecular adsorption, and efficient charge transfer. These properties facilitate faster charge separation and minimize electron–hole recombination, leading to exceptional photodegradation performance, long-term stability, and significant attention in dye degradation applications. Ti3C2Tx MXene-based hybrid photocatalysts significantly improve dye degradation efficiency, as evidenced by higher percentage degradation and reduced degradation time compared to conventional semiconducting materials. This review also highlights computational techniques employed to assess and enhance the performance of Ti3C2Tx MXene-based hybrid photocatalysts for dye degradation. It identifies the challenges associated with Ti3C2Tx MXene-based hybrid photocatalyst research and proposes potential solutions, outlining future research directions to address these obstacles effectively.more » « less
An official website of the United States government
