skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The LCD-Composer webserver: high-specificity identification and functional analysis of low-complexity domains in proteins
Abstract SummaryLow-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can (i) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, (ii) submit a known LCD sequence to search for similar LCDs, (iii) search for and plot LCDs within a single protein, (iv) statistically test for enrichment of LCDs within a user-provided protein set and (v) specifically identify proteins with multiple types of LCDs. Availability and implementationThe LCD-Composer server can be accessed at http://lcd-composer.bmb.colostate.edu. The corresponding command-line scripts can be accessed at https://github.com/RossLabCSU/LCD-Composer/tree/master/WebserverScripts.  more » « less
Award ID(s):
1817622
PAR ID:
10378575
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs. 
    more » « less
  2. Abstract We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase‐separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain‐level organization and compute a sequence‐based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visitinghttps://stevewhitten.github.io/Parse_v2_FASTAto quickly identify phase‐separating proteins within large sequence sets, or by visitinghttps://stevewhitten.github.io/Parse_v2_webto evaluate individual protein sequences. 
    more » « less
  3. Abstract Amino‐acid protein composition plays an important role in biology, medicine, and nutrition. Here, a groundbreaking protein analysis technique that quickly estimates amino acid composition and secondary structure across various protein sizes, while maintaining their natural states is introduced and validated. This method combines multivariate statistics and the thermostable Raman interaction profiling (TRIP) technique, eliminating the need for complex preparations. In order to validate the approach, the Raman spectra are constructed of seven proteins of varying sizes by utilizing their amino acid frequencies and the Raman spectra of individual amino acids. These constructed spectra exhibit a close resemblance to the actual measured Raman spectra. Specific vibrational modes tied to free amino and carboxyl termini of the amino acids disappear as signals linked to secondary structures emerged under TRIP conditions. Furthermore, the technique is used inversely to successfully estimate amino acid compositions and secondary structures of unknown proteins across a range of sizes, achieving impressive accuracy ranging between 1.47% and 5.77% of root mean square errors (RMSE). These results extend the uses for TRIP beyond interaction profiling, to probe amino acid composition and structure. 
    more » « less
  4. O'Toole, George (Ed.)
    ABSTRACT Members of the widely conserved progestin and adipoQ receptor (PAQR) family function to maintain membrane homeostasis: membrane fluidity and fatty acid composition in eukaryotes and membrane energetics and fatty acid composition in bacteria. All PAQRs consist of a core seven transmembrane domain structure and five conserved amino acids (three histidines, one serine, and one aspartic acid) predicted to form a hydrolase-like catalytic site. PAQR homologs in Bacteria (called TrhA, for transmembrane homeostasis protein A) maintain homeostasis of membrane charge gradients, like the membrane potential and proton gradient that comprise the proton motive force, but their molecular mechanisms are not yet understood. Here, we show that TrhA inEscherichia colihas a periplasmic C-terminus, which places the five conserved residues shared by all PAQRs at the cytoplasmic interface of the membrane. Here, we characterize several conserved residues predicted to form an active site by site-directed mutagenesis. We also identify a specific role for TrhA in modulating unsaturated fatty acid biosynthesis with conserved residues required to either promote or reduce the abundance of unsaturated fatty acids. We also identify distinct roles for the conserved residues in supporting TrhA’s role in maintaining membrane energetics homeostasis that suggest that both functions are intertwined and probably partly dependent on one another. An analysis of domain architecture of TrhA-like domains in Bacteria further supports a function of TrhA linking membrane energetics homeostasis with biosynthesis of unsaturated fatty acid in the membrane. IMPORTANCEProgestin and adipoQ receptor (PAQR) family proteins are evolutionary conserved regulators of membrane homeostasis and have been best characterized in eukaryotes. Bacterial PAQR homologs, named TrhA (transmembrane homeostasis protein A), regulate membrane energetics homeostasis through an unknown mechanism. Here, we present evidence linking TrhA to both membrane energetics homeostasis and unsaturated fatty acid biosynthesis. Analysis of domain architecture together with experimental evidence suggests a model where TrhA activity on unsaturated fatty acid biosynthesis is regulated by changes in membrane energetics to dynamically adjust membrane homeostasis. 
    more » « less
  5. null (Ed.)
    Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/. 
    more » « less