skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sequence-ensemble-function relationships for disordered proteins in live cells
ABSTRACT Intrinsically disordered protein regions (IDRs) are ubiquitous across all kingdoms of life and play a variety of essential cellular roles. IDRs exist in a collection of structurally distinct conformers known as an ensemble. An IDR’s amino acid sequence determines its ensemble, which in turn can play an important role in dictating molecular function. Yet a clear link connecting IDR sequence, its ensemble properties, and its molecular function in living cells has not been directly established. Here, we set out to test this sequence-ensemble-function paradigm using a novel computational method (GOOSE) that enables the rational design of libraries of IDRs by systematically varying specific sequence properties. Using ensemble FRET, we measured the ensemble dimensions of a library of rationally designed IDRs in human-derived cell lines, revealing how IDR sequence influences ensemble dimensionsin situ.Furthermore, we show that the interplay between sequence and ensemble can tune an IDR’s ability to sense changes in cell volume - ade novomolecular function for these synthetic sequences. Our results establish biophysical rules for intracellular sequence-ensemble relationships, enable a new route for understanding how IDR sequences map to function in live cells, and set the ground for the design of synthetic IDRs withde novofunction.  more » « less
Award ID(s):
2128068 2128067
PAR ID:
10528469
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence–ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes. 
    more » « less
  2. Molecular descriptions of intrinsically disordered protein regions (IDRs) are fundamental to understanding their cellular functions and regulation. NMR spectroscopy has been a leading tool in characterizing IDRs at the atomic level. In this review, we highlight recent conceptual breakthroughs in the study of IDRs facilitated by NMR and discuss emerging NMR techniques that bridge molecular descriptions to cellular functions. First, we review the assemblies formed by IDRs at various scales, from one-to-one complexes to non-stoichiometric clusters and condensates, discussing how NMR characterizes their structural dynamics and molecular interactions. Next, we explore several unique interaction modes of IDRs that enable regulatory mechanisms such as selective transport and switch-like inhibition. Finally, we highlight recent progress in solid-state NMR and in-cell NMR on IDRs, discussing how these methods allow for atomic characterization of full-length IDR complexes in various phases and cellular environments. This review emphasizes recent conceptual and methodological advancements in IDR studies by NMR and offers future perspectives on bridging the gap between in vitro molecular descriptions and the cellular functions of IDRs. 
    more » « less
  3. Abstract Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins,β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design. 
    more » « less
  4. Abstract Novel proteins can originatede novofrom non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive howde novoemerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26de novoemerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins’ C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing ofde novoemerging proteins with unique sequences, likely owing to common underlying protein organization patterns. 
    more » « less
  5. Intrinsically disordered protein regions (IDRs) are well established as contributors to intermolecular interactions and the formation of biomolecular condensates. In particular, RNA-binding proteins (RBPs) often harbor IDRs in addition to folded RNA-binding domains that contribute to RBP function. To understand the dynamic interactions of an IDR–RNA complex, we characterized the RNA-binding features of a small (68 residues), positively charged IDR-containing protein, Small ERDK-Rich Factor (SERF). At high concentrations, SERF and RNA undergo charge-driven associative phase separation to form a protein- and RNA-rich dense phase. A key advantage of this model system is that this threshold for demixing is sufficiently high that we could use solution-state biophysical methods to interrogate the stoichiometric complexes of SERF with RNA in the one-phase regime. Herein, we describe our comprehensive characterization of SERF alone and in complex with a small fragment of the HIV-1 Trans-Activation Response (TAR) RNA with complementary biophysical methods and molecular simulations. We find that this binding event is not accompanied by the acquisition of structure by either molecule; however, we see evidence for a modest global compaction of the SERF ensemble when bound to RNA. This behavior likely reflects attenuated charge repulsion within SERF via binding to the polyanionic RNA and provides a rationale for the higher-order assembly of SERF in the context of RNA. We envision that the SERF–RNA system will lower the barrier to accessing the details that support IDR–RNA interactions and likewise deepen our understanding of the role of IDR–RNA contacts in complex formation and liquid–liquid phase separation. 
    more » « less