NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering

https://doi.org/10.1038/s41467-024-50698-y

Ding, Kerr; Chin, Michael; Zhao, Yunlong; Huang, Wei; Mai, Binh Khanh; Wang, Huanan; Liu, Peng; Yang, Yang; Luo, Yunan (December 2024, Nature Communications)

Abstract The effective design of combinatorial libraries to balance fitness and diversity facilitates the engineering of useful enzyme functions, particularly those that are poorly characterized or unknown in biology. We introduce MODIFY, a machine learning (ML) algorithm that learns from natural protein sequences to infer evolutionarily plausible mutations and predict enzyme fitness. MODIFY co-optimizes predicted fitness and sequence diversity of starting libraries, prioritizing high-fitness variants while ensuring broad sequence coverage. In silico evaluation shows that MODIFY outperforms state-of-the-art unsupervised methods in zero-shot fitness prediction and enables ML-guided directed evolution with enhanced efficiency. Using MODIFY, we engineer generalist biocatalysts derived from a thermostable cytochromecto achieve enantioselective C-B and C-Si bond formation via a new-to-nature carbene transfer mechanism, leading to biocatalysts six mutations away from previously developed enzymes while exhibiting superior or comparable activities. These results demonstrate MODIFY’s potential in solving challenging enzyme engineering problems beyond the reach of classic directed evolution.
more » « less
Full Text Available
Supervised biological network alignment with graph neural networks

https://doi.org/10.1093/bioinformatics/btad241

Ding, Kerr; Wang, Sheng; Luo, Yunan (June 2023, Bioinformatics)

Abstract MotivationDespite the advances in sequencing technology, massive proteins with known sequences remain functionally unannotated. Biological network alignment (NA), which aims to find the node correspondence between species’ protein–protein interaction (PPI) networks, has been a popular strategy to uncover missing annotations by transferring functional knowledge across species. Traditional NA methods assumed that topologically similar proteins in PPIs are functionally similar. However, it was recently reported that functionally unrelated proteins can be as topologically similar as functionally related pairs, and a new data-driven or supervised NA paradigm has been proposed, which uses protein function data to discern which topological features correspond to functional relatedness. ResultsHere, we propose GraNA, a deep learning framework for the supervised NA paradigm for the pairwise NA problem. Employing graph neural networks, GraNA utilizes within-network interactions and across-network anchor links for learning protein representations and predicting functional correspondence between across-species proteins. A major strength of GraNA is its flexibility to integrate multi-faceted non-functional relationship data, such as sequence similarity and ortholog relationships, as anchor links to guide the mapping of functionally related proteins across species. Evaluating GraNA on a benchmark dataset composed of several NA tasks between different pairs of species, we observed that GraNA accurately predicted the functional relatedness of proteins and robustly transferred functional annotations across species, outperforming a number of existing NA methods. When applied to a case study on a humanized yeast network, GraNA also successfully discovered functionally replaceable human–yeast protein pairs that were documented in previous studies. Availability and implementationThe code of GraNA is available at https://github.com/luo-group/GraNA.
more » « less
Full Text Available
Molecule Maker Lab Institute: Accelerating, advancing, and democratizing molecular innovation

https://doi.org/10.1002/aaai.12154

Burke, Martin D; Denmark, Scott E; Diao, Ying; Han, Jiawei; Switzky, Rachel; Zhao, Huimin (March 2024, AI Magazine)

Abstract Many of the greatest challenges facing society today likely have molecular solutions that await discovery. However, the process of identifying and manufacturing such molecules has remained slow and highly specialist dependent. Interfacing the fields of artificial intelligence (AI) and synthetic organic chemistry has the potential to powerfully address both limitations. The Molecule Maker Lab Institute (MMLI) brings together a team of chemists, engineers, and AI‐experts from the University of Illinois Urbana‐Champaign (UIUC), Pennsylvania State University, and the Rochester Institute of Technology, with the goal of accelerating the discovery, synthesis and manufacture of complex organic molecules. Advanced AI and machine learning (ML) methods are deployed in four key thrusts: (1) AI‐enabled synthesis planning, (2) AI‐enabled catalyst development, (3) AI‐enabled molecule manufacturing, and (4) AI‐enabled molecule discovery. The MMLI's new AI‐enabled synthesis platform integrates chemical and enzymatic catalysis with literature mining and ML to predict the best way to make new molecules with desirable biological and material properties. The MMLI is transforming chemical synthesis and generating use‐inspired AI advances. Simultaneously, the MMLI is also acting as a training ground for the next generation of scientists with combined expertise in chemistry and AI. Outreach efforts aimed toward high school students and the public are being used to show how AI‐enabled tools can help to make chemical synthesis accessible to nonexperts.
more » « less
Full Text Available
Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

https://doi.org/10.1145/3580305.3599544

Zhang, Yu; Jin, Bowen; Chen, Xiusi; Shen, Yanzhen; Zhang, Yunyi; Meng, Yu; Han, Jiawei (August 2023, ACM)
Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Ed.)
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies on weakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-grained themes, and potentially into multiple themes, given a large and fine-grained label space; and (2) full text should be utilized to complement the paper title and abstract for classification. Moreover, instead of viewing the entire paper as a long linear sequence, one should exploit the structural information such as citation links across papers and the hierarchy of sections and paragraphs in each paper. To tackle these challenges, in this study, we propose FuTex, a framework that uses the cross-paper network structure and the in-paper hierarchy structure to classify full-text scientific papers under weak supervision. A network-aware contrastive fine-tuning module and a hierarchyaware aggregation module are designed to leverage the two types of structural signals, respectively. Experiments on two benchmark datasets demonstrate that FuTex significantly outperforms competitive baselines and is on par with fully supervised classifiers that use 1,000 to 60,000 ground-truth training samples.
more » « less
Full Text Available
A generalized platform for artificial intelligence-powered autonomous enzyme engineering

https://doi.org/10.1038/s41467-025-61209-y

Singh, Nilmani; Lane, Stephan; Yu, Tianhao; Lu, Jingxia; Ramos, Adrianna; Cui, Haiyang; Zhao, Huimin (December 2025, Nature Communications)

Full Text Available
CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters

https://doi.org/10.1038/s41467-025-57215-9

Boorla, Veda Sheersh; Maranas, Costas D (December 2025, Nature Communications)

Full Text Available
Automated Iterative N─C and C─C Bond Formation

https://doi.org/10.1002/anie.202509974

Tyrikos‐Ergas, Theodore; Agiakloglou, Sevasti; LaPorte, Antonio J; Wang, Wesley; Chan, Chieh‐Kai; Wells, Clare E; Rakowski, Christopher K; Hammond, Rachel I; Qiu, Jia; Raymond, Jonathan D; et al (August 2025, Angewandte Chemie International Edition)

Small molecule solutions to many contemporary societal challenges await discovery, but the artisanal and manual process via which this class of chemical matter is typically accessed limits the discovery of new functions. Automated assembly of (N‐methyl iminodiacetic acid) MIDA or (tetramethyl N‐methyl iminodiacetic acid) TIDA boronate building blocks via iterative C─C bond formation, an approach we call “block chemistry”, alternatively enables generalized and automated preparation of many different types of small molecules in a modular fashion. But in its current form, this engine cannot also leverage nitrogen atoms as iteration handles. Here, we disclose a new iteration‐enabling group, CbzT (p‐TIDA boronate‐substituted carboxybenzyl), that reversibly attenuates the reactivity of nitrogen atoms and enables generalized catch‐and‐release purification. CbzT is leveraged to achieve the automated modular synthesis of Imatinib (Gleevec), an archetypical clinically approved kinase inhibitor, in which building blocks are iteratively linked by both N─C and C─C bonds. This work substantially expands the types of small molecules that can be iteratively assembled in an automated modular fashion. It also advances the concept of intentionally developing chemistry that machines can do.
more » « less
Full Text Available
novoStoic2.0: An integrated framework for pathway synthesis, thermodynamic evaluation, and enzyme selection

https://doi.org/10.1371/journal.pcbi.1012516

Upadhyay, Vikas; Anand, Mohit; Maranas, Costas D (August 2025, PLOS Computational Biology)
Raghunathan, Anu (Ed.)
Computational pathway design and retro-biosynthetic approaches can facilitate the development of innovative biochemical production routes, biodegradation strategies, and the funneling of multiple precursors into a single bioproduct. However, effective pathway design necessitates a comprehensive understanding of biochemistries, enzyme activities, and thermodynamic feasibility. Herein, we introduce novoStoic2.0, an integrated platform that combines tools for estimating overall stoichiometry, designing de novo synthesis pathways, assessing thermodynamic feasibility, and selecting enzymes. novoStoic2.0 offers a unified web-based interface as a part of the AlphaSynthesis platform (http://novostoic.platform.moleculemaker.org/) tailored for the synthesis of thermodynamically viable pathways as well as the selection of enzymes for re-engineering required for novel reaction steps. We exemplify the utility of the platform to identify novel pathways for hydroxytyrosol synthesis, which are shorter than the known pathways and require reduced cofactor usage. In summary, novoStoic2.0 aims to streamline the process of pathway design contributing to the development of sustainable biotechnological solutions.
more » « less
Full Text Available
Data-Driven Prediction of Enantioselectivity for the Sharpless Asymmetric Dihydroxylation: Model Development and Experimental Validation

https://doi.org/10.1021/acscentsci.5c00900

Ocampo, Blake E; Altundas, Bilal; Bock, Matthew J; Feiz, Sara; Denmark, Scott E (July 2025, ACS Central Science)

Full Text Available
Geometry Informed Tokenization of Molecules for Language Model Generation

Li, Xiner; Wang, Limei; Luo, Youzhi; Edwards, Carl; Gui, Shurui; Lin, Yuchao; Ji, Heng; Ji, Shuiwang (July 2025, ICML)

Full Text Available

« Prev Next »

Search for: All records