A text-guided protein design framework

Liu, Shengchao; Li, Yanjing; Li, Zhuoxinran; Gitter, Anthony; Zhu, Yutao; Lu, Jiarui; Xu, Zhao; Nie, Weili; Ramanathan, Arvind; Xiao, Chaowei; Tang, Jian; Guo, Hongyu; Anandkumar, Anima

doi:10.1038/s42256-025-01011-z

Citation Details

This content will become publicly available on March 27, 2026

A text-guided protein design framework

Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks. more »

Award ID(s):: 2226451

PAR ID:: 10636667

Author(s) / Creator(s):: Liu, Shengchao; Li, Yanjing; Li, Zhuoxinran; Gitter, Anthony; Zhu, Yutao; Lu, Jiarui; Xu, Zhao; Nie, Weili; Ramanathan, Arvind; Xiao, Chaowei; Tang, Jian; Guo, Hongyu; Anandkumar, Anima

Publisher / Repository:: Nature Publishing Group

Date Published:: 2025-03-27

Journal Name:: Nature Machine Intelligence

Volume:: 7

Issue:: 4

ISSN:: 2522-5839

Page Range / eLocation ID:: 580 to 591

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on March 27, 2026
Journal Article:
https://doi.org/10.1038/s42256-025-01011-z

More Like this