Abstract Deep learning approaches like AlphaFold 2 (AF2) have revolutionized structural biology by accurately predicting the ground state structures of proteins. Recently, clustering and subsampling techniques that manipulate multiple sequence alignment (MSA) inputs into AlphaFold to generate conformational ensembles of proteins have also been proposed. Although many of these techniques have been made open source, they often require integrating multiple packages and can be challenging for researchers who have a limited programming background to employ. This is especially true when researchers are interested in subsampling to produce predictions of protein conformational ensembles, which require multiple computational steps. This manuscript introduces FastConformation, a Python-based application that integrates MSA generation, structure prediction via AF2, and interactive analysis of protein conformations and their distributions, all in one place. FastConformation is accessible through a user-friendly GUI suitable for non-programmers, allowing users to iteratively refine subsampling parameters based on their analyses to achieve diverse conformational ensembles. Starting from an amino acid sequence, users can make protein conformation predictions and analyze results in just a few hours on their local machines, which is significantly faster than traditional molecular dynamics (MD) simulations. Uniquely, by leveraging the subsampling of MSAs, our tool enables the generation of alternative protein conformations. We demonstrate the utility of FastConformation on proteins including the Abl1 kinase, LAT1 transporter, and CCR5 receptor, showcasing its ability to predict and analyze the protein conformational ensembles and effects of mutations on a variety of proteins. This tool enables a wide range of high-throughput applications in protein biochemistry, drug discovery, and protein engineering.
more »
« less
This content will become publicly available on November 26, 2025
Predicting multiple conformations of ligand binding sites in proteins suggests that AlphaFold2 may remember too much
The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 “remembering” the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations.
more »
« less
- Award ID(s):
- 2200052
- PAR ID:
- 10620819
- Publisher / Repository:
- PNAS
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 121
- Issue:
- 48
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.more » « less
-
G protein-coupled receptors (GPCRs) represent the largest group of membrane receptors for transmembrane signal transduction. Ligand-induced activation of GPCRs triggers G protein activation followed by various signaling cascades. Understanding the structural and energetic determinants of ligand binding to GPCRs and GPCRs to G proteins is crucial to the design of pharmacological treatments targeting specific conformations of these proteins to precisely control their signaling properties. In this study, we focused on interactions of a prototypical GPCR, beta-2 adrenergic receptor (β 2 AR), with its endogenous agonist, norepinephrine (NE), and the stimulatory G protein (G s ). Using molecular dynamics (MD) simulations, we demonstrated the stabilization of cationic NE, NE(+), binding to β 2 AR by G s protein recruitment, in line with experimental observations. We also captured the partial dissociation of the ligand from β 2 AR and the conformational interconversions of G s between closed and open conformations in the NE(+)–β 2 AR–G s ternary complex while it is still bound to the receptor. The variation of NE(+) binding poses was found to alter G s α subunit (G s α) conformational transitions. Our simulations showed that the interdomain movement and the stacking of G s α α1 and α5 helices are significant for increasing the distance between the G s α and β 2 AR, which may indicate a partial dissociation of G s α The distance increase commences when G s α is predominantly in an open state and can be triggered by the intracellular loop 3 (ICL3) of β 2 AR interacting with G s α, causing conformational changes of the α5 helix. Our results help explain molecular mechanisms of ligand and GPCR-mediated modulation of G protein activation.more » « less
-
Abstract This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins’ ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.more » « less
-
Abstract With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein–ligand interactions that are biologically relevant. We developed the BioLiP2 database (https://zhanggroup.org/BioLiP) to extract biologically relevant protein–ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses.more » « less
An official website of the United States government
