skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sequence-ensemble-function relationships for disordered proteins in live cells
ABSTRACT Intrinsically disordered protein regions (IDRs) are ubiquitous across all kingdoms of life and play a variety of essential cellular roles. IDRs exist in a collection of structurally distinct conformers known as an ensemble. An IDR’s amino acid sequence determines its ensemble, which in turn can play an important role in dictating molecular function. Yet a clear link connecting IDR sequence, its ensemble properties, and its molecular function in living cells has not been directly established. Here, we set out to test this sequence-ensemble-function paradigm using a novel computational method (GOOSE) that enables the rational design of libraries of IDRs by systematically varying specific sequence properties. Using ensemble FRET, we measured the ensemble dimensions of a library of rationally designed IDRs in human-derived cell lines, revealing how IDR sequence influences ensemble dimensionsin situ.Furthermore, we show that the interplay between sequence and ensemble can tune an IDR’s ability to sense changes in cell volume - ade novomolecular function for these synthetic sequences. Our results establish biophysical rules for intracellular sequence-ensemble relationships, enable a new route for understanding how IDR sequences map to function in live cells, and set the ground for the design of synthetic IDRs withde novofunction.  more » « less
Award ID(s):
2128068 2128067
PAR ID:
10528469
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence–ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes. 
    more » « less
  2. Molecular descriptions of intrinsically disordered protein regions (IDRs) are fundamental to understanding their cellular functions and regulation. NMR spectroscopy has been a leading tool in characterizing IDRs at the atomic level. In this review, we highlight recent conceptual breakthroughs in the study of IDRs facilitated by NMR and discuss emerging NMR techniques that bridge molecular descriptions to cellular functions. First, we review the assemblies formed by IDRs at various scales, from one-to-one complexes to non-stoichiometric clusters and condensates, discussing how NMR characterizes their structural dynamics and molecular interactions. Next, we explore several unique interaction modes of IDRs that enable regulatory mechanisms such as selective transport and switch-like inhibition. Finally, we highlight recent progress in solid-state NMR and in-cell NMR on IDRs, discussing how these methods allow for atomic characterization of full-length IDR complexes in various phases and cellular environments. This review emphasizes recent conceptual and methodological advancements in IDR studies by NMR and offers future perspectives on bridging the gap between in vitro molecular descriptions and the cellular functions of IDRs. 
    more » « less
  3. Multiple biomolecular condensates coexist at the pre- and post- synapse to enable vesicle dynamics and controlled neurotransmitter release in the brain. In pre-synapses, intrinsically disordered regions (IDRs) of synaptic proteins are drivers of condensation that enable clustering of synaptic vesicles (SVs). Using computational analysis, we show that the IDRs of SV proteins feature evolutionarily conserved non-random compositional biases and sequence patterns. Synapsin-1 is essential for condensation of SVs, and its C-terminal IDR has been shown to be a key driver of condensation. Focusing on this IDR, we dissected the contributions of two conserved features namely the segregation of polar and proline residues along the linear sequence, and the compositional preference for arginine over lysine. Scrambling the blocks of polar and proline residues weakens the driving forces for forming micron-scale condensates. However, the extent of clustering in subsaturated solutions remains equivalent to that of the wild-type synapsin-1. In contrast, substituting arginine with lysine significantly weakens both the driving forces for condensation and the extent of clustering in subsaturated solutions. Co-expression of the scrambled variant of synapsin-1 with synaptophysin results in a gain-of-function phenotype in cells, whereas arginine to lysine substitutions eliminate condensation in cells. We report an emergent consequence of synapsin-1 condensation, which is the generation of interphase pH gradients that is realized via differential partitioning of protons between coexisting phases. This pH gradient is likely to be directly relevant for vesicular ATPase functions and the loading of neurotransmitters. Our studies highlight how conserved IDR grammars serve as drivers of synapsin-1 condensation. 
    more » « less
  4. Abstract Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins,β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design. 
    more » « less
  5. Abstract Novel proteins can originatede novofrom non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive howde novoemerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26de novoemerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins’ C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing ofde novoemerging proteins with unique sequences, likely owing to common underlying protein organization patterns. 
    more » « less