skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Title: LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
Abstract Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
NAR Genomics and Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Summary

    Low-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can (i) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, (ii) submit a known LCD sequence to search for similar LCDs, (iii) search for and plot LCDs within a single protein, (iv) statistically test for enrichment of LCDs within a user-provided protein set and (v) specifically identify proteins with multiple types of LCDs.

    Availability and implementation

    The LCD-Composer server can be accessed at The corresponding command-line scripts can be accessed at

    more » « less
  2. Nature encodes the information required for life in two fundamental biopolymers: nucleic acids and proteins. Peptide nucleic acid (PNA), a synthetic analog comprised of nucleobases arrayed along a pseudopeptide backbone, has the ability to combine the power of nucleic acids to encode information with the versatility of amino acids to encode structure and function. Historically, PNA has been perceived as a simple nucleic acid mimic having desirable properties such as high biostability and strong affinity for complementary nucleic acids. In this feature article, we aim to adjust this perception by highlighting the ability of PNA to act as a peptide mimic and showing the largely untapped potential to encode information in the amino acid sequence. First, we provide an introduction to PNA and discuss the use of conjugation to impart tunable properties to the biopolymer. Next, we describe the integration of functional groups directly into the PNA backbone to impart specific physical properties. Lastly, we highlight the use of these integrated amino acid side chains to encode peptide-like sequences in the PNA backbone, imparting novel activity and function and demonstrating the ability of PNA to simultaneously mimic both a peptide and a nucleic acid. 
    more » « less
  3. Abstract

    Many microorganisms are auxotrophic—unable to synthesize the compounds they require for growth. With this work, we quantify the prevalence of amino acid auxotrophies across a broad diversity of bacteria and habitats. We predicted the amino acid biosynthetic capabilities of 26,277 unique bacterial genomes spanning 12 phyla using a metabolic pathway model validated with empirical data. Amino acid auxotrophy is widespread across bacterial phyla, but we conservatively estimate that the majority of taxa (78.4%) are able to synthesize all amino acids. Our estimates indicate that amino acid auxotrophies are more prevalent among obligate intracellular parasites and in free-living taxa with genomic attributes characteristic of ‘streamlined’ life history strategies. We predicted the amino acid biosynthetic capabilities of bacterial communities found in 12 unique habitats to investigate environmental associations with auxotrophy, using data compiled from 3813 samples spanning major aquatic, terrestrial, and engineered environments. Auxotrophic taxa were more abundant in host-associated environments (including the human oral cavity and gut) and in fermented food products, with auxotrophic taxa being relatively rare in soil and aquatic systems. Overall, this work contributes to a more complete understanding of amino acid auxotrophy across the bacterial tree of life and the ecological contexts in which auxotrophy can be a successful strategy.

    more » « less
  4. Abstract

    Alphaherpesviruses are a subfamily of herpesviruses that include the significant human pathogens herpes simplex viruses (HSV) and varicella zoster virus (VZV). Glycoprotein K (gK), conserved in all alphaherpesviruses, is a multi-membrane spanning virion glycoprotein essential for virus entry into neuronal axons, virion assembly, and pathogenesis. Despite these critical functions, little is known about which gK domains and residues are most important for maintaining these functions across all alphaherpesviruses. Herein, we employed phylogenetic and structural analyses including the use of a novel model for evolutionary rate variation across residues to predict conserved gK functional domains. We found marked heterogeneity in the evolutionary rate at the level of both individual residues and domains, presumably as a result of varying selective constraints. To clarify the potential role of conserved sequence features, we predicted the structures of several gK orthologs. Congruent with our phylogenetic analysis, slowly evolving residues were identified at potentially structurally significant positions across domains. We found that using a quantitative measure of amino acid rate variation combined with molecular modeling we were able to identify amino acids predicted to be critical for gK protein structure/function. This analysis yields targets for the design of anti-herpesvirus therapeutic strategies across all alphaherpesvirus species that would be absent from more traditional analyses of conservation.

    more » « less
  5. Understanding how membrane forming amphiphiles are synthesized and aggregate in prebiotic settings is required for understanding the origins of life on Earth 4 billion years ago. Amino acids decyl esters were prepared by dehydration of decanol and amino acid as a model for a plausible prebiotic reaction at two temperatures. Fifteen amino acids were tested with a range of side chain chemistries to understand the role of amino acid identity on synthesis and membrane formation. Products were analyzed using LC-MS as well as microscopy. All amino acids tested produced decyl esters, and some of the products formed membranes when rehydrated in ultrapure water. One of the most abundant prebiotic amino acids, alanine, was remarkably easy to get to generate abundant, uniform membranes, indicating that this could be a selection mechanism for both amino acids and their amphiphilic derivatives. 
    more » « less