skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.


Title: The LCD-Composer webserver: high-specificity identification and functional analysis of low-complexity domains in proteins
Abstract Summary

Low-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can (i) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, (ii) submit a known LCD sequence to search for similar LCDs, (iii) search for and plot LCDs within a single protein, (iv) statistically test for enrichment of LCDs within a user-provided protein set and (v) specifically identify proteins with multiple types of LCDs.

Availability and implementation

The LCD-Composer server can be accessed at http://lcd-composer.bmb.colostate.edu. The corresponding command-line scripts can be accessed at https://github.com/RossLabCSU/LCD-Composer/tree/master/WebserverScripts.

 
more » « less
Award ID(s):
1817622
NSF-PAR ID:
10378575
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs. 
    more » « less
  2. Abstract Background

    Biomolecular condensates are non-stoichiometric assemblies that are characterized by their capacity to spatially concentrate biomolecules and play a key role in cellular organization. Proteins that drive the formation of biomolecular condensates frequently contain oligomerization domains and intrinsically disordered regions (IDRs), both of which can contribute multivalent interactions that drive higher-order assembly. Our understanding of the relative and temporal contribution of oligomerization domains and IDRs to the material properties of in vivo biomolecular condensates is limited. Similarly, the spatial and temporal dependence of protein oligomeric state inside condensates has been largely unexplored in vivo.

    Methods

    In this study, we combined quantitative microscopy with number and brightness analysis to investigate the aging, material properties, and protein oligomeric state of biomolecular condensates in vivo. Our work is focused on condensates formed by AUXIN RESPONSE FACTOR 19 (ARF19), a transcription factor integral to the auxin signaling pathway in plants. ARF19 contains a large central glutamine-rich IDR and a C-terminal Phox Bem1 (PB1) oligomerization domain and forms cytoplasmic condensates.

    Results

    Our results reveal that the IDR amino acid composition can influence the morphology and material properties of ARF19 condensates. In contrast the distribution of oligomeric species within condensates appears insensitive to the IDR composition. In addition, we identified a relationship between the abundance of higher- and lower-order oligomers within individual condensates and their apparent fluidity.

    Conclusions

    IDR amino acid composition affects condensate morphology and material properties. In ARF condensates, altering the amino acid composition of the IDR did not greatly affect the oligomeric state of proteins within the condensate.

     
    more » « less
  3. Abstract Motivation

    The relative rates of amino acid interchanges over evolutionary time are likely to vary among proteins. Variation in those rates has the potential to reveal information about constraints on proteins. However, the most straightforward model that could be used to estimate relative rates of amino acid substitution is parameter-rich and it is therefore impractical to use for this purpose.

    Results

    A six-parameter model of amino acid substitution that incorporates information about the physicochemical properties of amino acids was developed. It showed that amino acid side chain volume, polarity and aromaticity have major impacts on protein evolution. It also revealed variation among proteins in the relative importance of those properties. The same general approach can be used to improve the fit of empirical models such as the commonly used PAM and LG models.

    Availability and implementation

    Perl code and test data are available from https://github.com/ebraun68/sixparam.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract

    Amino‐acid protein composition plays an important role in biology, medicine, and nutrition. Here, a groundbreaking protein analysis technique that quickly estimates amino acid composition and secondary structure across various protein sizes, while maintaining their natural states is introduced and validated. This method combines multivariate statistics and the thermostable Raman interaction profiling (TRIP) technique, eliminating the need for complex preparations. In order to validate the approach, the Raman spectra are constructed of seven proteins of varying sizes by utilizing their amino acid frequencies and the Raman spectra of individual amino acids. These constructed spectra exhibit a close resemblance to the actual measured Raman spectra. Specific vibrational modes tied to free amino and carboxyl termini of the amino acids disappear as signals linked to secondary structures emerged under TRIP conditions. Furthermore, the technique is used inversely to successfully estimate amino acid compositions and secondary structures of unknown proteins across a range of sizes, achieving impressive accuracy ranging between 1.47% and 5.77% of root mean square errors (RMSE). These results extend the uses for TRIP beyond interaction profiling, to probe amino acid composition and structure.

     
    more » « less
  5. Abstract

    We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase‐separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain‐level organization and compute a sequence‐based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visitinghttps://stevewhitten.github.io/Parse_v2_FASTAto quickly identify phase‐separating proteins within large sequence sets, or by visitinghttps://stevewhitten.github.io/Parse_v2_webto evaluate individual protein sequences.

     
    more » « less