skip to main content

Title: The LCD-Composer webserver: high-specificity identification and functional analysis of low-complexity domains in proteins
Abstract Summary

Low-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can (i) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, (ii) submit a known LCD sequence to search for similar LCDs, (iii) search for and plot LCDs within a single protein, (iv) statistically test for enrichment of LCDs within a user-provided protein set and (v) specifically identify proteins with multiple types of LCDs.

Availability and implementation

The LCD-Composer server can be accessed at The corresponding command-line scripts can be accessed at

more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs. 
    more » « less
  2. Abstract Background

    Biomolecular condensates are non-stoichiometric assemblies that are characterized by their capacity to spatially concentrate biomolecules and play a key role in cellular organization. Proteins that drive the formation of biomolecular condensates frequently contain oligomerization domains and intrinsically disordered regions (IDRs), both of which can contribute multivalent interactions that drive higher-order assembly. Our understanding of the relative and temporal contribution of oligomerization domains and IDRs to the material properties of in vivo biomolecular condensates is limited. Similarly, the spatial and temporal dependence of protein oligomeric state inside condensates has been largely unexplored in vivo.


    In this study, we combined quantitative microscopy with number and brightness analysis to investigate the aging, material properties, and protein oligomeric state of biomolecular condensates in vivo. Our work is focused on condensates formed by AUXIN RESPONSE FACTOR 19 (ARF19), a transcription factor integral to the auxin signaling pathway in plants. ARF19 contains a large central glutamine-rich IDR and a C-terminal Phox Bem1 (PB1) oligomerization domain and forms cytoplasmic condensates.


    Our results reveal that the IDR amino acid composition can influence the morphology and material properties of ARF19 condensates. In contrast the distribution of oligomeric species within condensates appears insensitive to the IDR composition. In addition, we identified a relationship between the abundance of higher- and lower-order oligomers within individual condensates and their apparent fluidity.


    IDR amino acid composition affects condensate morphology and material properties. In ARF condensates, altering the amino acid composition of the IDR did not greatly affect the oligomeric state of proteins within the condensate.

    more » « less
  3. Abstract

    We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase‐separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain‐level organization and compute a sequence‐based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visiting quickly identify phase‐separating proteins within large sequence sets, or by visiting evaluate individual protein sequences.

    more » « less
  4. null (Ed.)
    It has become increasingly apparent that the lipid composition of cell membranes affects the function of transmembrane proteins such as ion channels. Here, we leverage the structural and functional diversity of small viral K+ channels to systematically examine the impact of bilayer composition on the pore module of single K+ channels. In vitro–synthesized channels were reconstituted into phosphatidylcholine bilayers ± cholesterol or anionic phospholipids (aPLs). Single-channel recordings revealed that a saturating concentration of 30% cholesterol had only minor and protein-specific effects on unitary conductance and gating. This indicates that channels have effective strategies for avoiding structural impacts of hydrophobic mismatches between proteins and the surrounding bilayer. In all seven channels tested, aPLs augmented the unitary conductance, suggesting that this is a general effect of negatively charged phospholipids on channel function. For one channel, we determined an effective half-maximal concentration of 15% phosphatidylserine, a value within the physiological range of aPL concentrations. The different sensitivity of two channel proteins to aPLs could be explained by the presence/absence of cationic amino acids at the interface between the lipid headgroups and the transmembrane domains. aPLs also affected gating in some channels, indicating that conductance and gating are uncoupled phenomena and that the impact of aPLs on gating is protein specific. In two channels, the latter can be explained by the altered orientation of the pore-lining transmembrane helix that prevents flipping of a phenylalanine side chain into the ion permeation pathway for long channel closings. Experiments with asymmetrical bilayers showed that this effect is leaflet specific and most effective in the inner leaflet, in which aPLs are normally present in plasma membranes. The data underscore a general positive effect of aPLs on the conductance of K+ channels and a potential interaction of their negative headgroup with cationic amino acids in their vicinity. 
    more » « less
  5. null (Ed.)
    Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at 
    more » « less