Abstract Recognition of short linear motifs (SLiMs) or peptides by proteins is an important component of many cellular processes. However, due to limited and degenerate binding motifs, prediction of cellular targets is challenging. In addition, many of these interactions are transient and of relatively low affinity. Here, we focus on one of the largest families of SLiM‐binding domains in the human proteome, the PDZ domain. These domains bind the extreme C‐terminus of target proteins, and are involved in many signaling and trafficking pathways. To predict endogenous targets of PDZ domains, we developedMotifAnalyzer‐PDZ, a program that filters and compares all motif‐satisfying sequences in any publicly available proteome. This approach enables us to determine possible PDZ binding targets in humans and other organisms. Using this program, we predicted and biochemically tested novel human PDZ targets by looking for strong sequence conservation in evolution. We also identified three C‐terminal sequences in choanoflagellates that bind a choanoflagellate PDZ domain, theMonsiga brevicollisSHANK1 PDZ domain (mbSHANK1), with endogenously‐relevant affinities, despite a lack of conservation with the targets of a homologous human PDZ domain, SHANK1. All three are predicted to be signaling proteins, with strong sequence homology to cytosolic and receptor tyrosine kinases. Finally, we analyzed and compared the positional amino acid enrichments in PDZ motif‐satisfying sequences from over a dozen organisms. Overall,MotifAnalyzer‐PDZis a versatile program to investigate potential PDZ interactions. This proof‐of‐concept work is poised to enable similar types of analyses for other SLiM‐binding domains (e.g.,MotifAnalyzer‐Kinase).MotifAnalyzer‐PDZis available athttp://motifAnalyzerPDZ.cs.wwu.edu.
more »
« less
General features of transmembrane beta barrels from a large database
Large datasets contribute new insights to subjects formerly investigated by exemplars. We used coevolution data to create a large, high-quality database of transmembrane β-barrels (TMBB). By applying simple feature detection on generated evolutionary contact maps, our method (IsItABarrel) achieves 95.88% balanced accuracy when discriminating among protein classes. Moreover, comparison with IsItABarrel revealed a high rate of false positives in previous TMBB algorithms. In addition to being more accurate than previous datasets, our database (available online) contains 1,938,936 bacterial TMBB proteins from 38 phyla, respectively, 17 and 2.2 times larger than the previous sets TMBB-DB and OMPdb. We anticipate that due to its quality and size, the database will serve as a useful resource where high-quality TMBB sequence data are required. We found that TMBBs can be divided into 11 types, three of which have not been previously reported. We find tremendous variance in proteome percentage among TMBB-containing organisms with some using 6.79% of their proteome for TMBBs and others using as little as 0.27% of their proteome. The distribution of the lengths of the TMBBs is suggestive of previously hypothesized duplication events. In addition, we find that the C-terminal β-signal varies among different classes of bacteria though its consensus sequence is LGLGYRF. However, this β-signal is only characteristic of prototypical TMBBs. The ten non-prototypical barrel types have other C-terminal motifs, and it remains to be determined if these alternative motifs facilitate TMBB insertion or perform any other signaling function.
more »
« less
- PAR ID:
- 10446616
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 120
- Issue:
- 29
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Protein farnesylation is a post-translational modification where a 15-carbon farnesyl isoprenoid is appended to the C-terminal end of a protein by farnesyltransferase (FTase). This process often causes proteins to associate with the membrane and participate in signal transduction pathways. The most common substrates of FTase are proteins that have C-terminal tetrapeptide CaaX box sequences where the cysteine is the site of modification. However, recent work has shown that five amino acid sequences can also be recognized, including the pentapeptides CMIIM and CSLMQ. In this work, peptide libraries were initially used to systematically vary the residues in those two parental sequences using an assay based on Matrix Assisted Laser Desorption Ionization–Mass Spectrometry (MALDI-MS). In addition, 192 pentapeptide sequences from the human proteome were screened using that assay to discover additional extended CaaaX-box motifs. Selected hits from that screening effort were rescreened using an in vivo yeast reporter protein assay. The X-ray crystal structure of CMIIM bound to FTase was also solved, showing that the C-terminal tripeptide of that sequence interacted with the enzyme in a similar manner as the C-terminal tripeptide of CVVM, suggesting that the tripeptide comprises a common structural element for substrate recognition in both tetrapeptide and pentapeptide sequences. Molecular dynamics simulation of CMIIM bound to FTase further shed light on the molecular interactions involved, showing that a putative catalytically competent Zn(II)-thiolate species was able to form. Bioinformatic predictions of tetrapeptide (CaaX-box) reactivity correlated well with the reactivity of pentapeptides obtained from in vivo analysis, reinforcing the importance of the C-terminal tripeptide motif. This analysis provides a structural framework for understanding the reactivity of extended CaaaX-box motifs and a method that may be useful for predicting the reactivity of additional FTase substrates bearing CaaaX-box sequences.more » « less
-
Elofsson, Arne (Ed.)As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo . To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both ‘β-trefoil bridging themes’ (evolutionarily-related sequence segments) and ‘β-trefoil-like motifs’ (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself–namely, that it is a derived fold formed by ‘budding’ from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature’s sewing table.more » « less
-
Abstract Biodiversity at larger spatial scales (γ) can be driven by within‐site partitions (α), with little variation in composition among locations, or can be driven by among‐site partitions (β) that signal the importance of spatial heterogeneity. For tropical elevational gradients, we determined the (a) extent to which variation in γ is driven by α‐ or β‐partitions; (b) elevational form of the relationship for each partition; and (c) extent to which elevational gradients are molded by zonation in vegetation or by gradual variation in climatic or abiotic characteristics. We sampled terrestrial gastropods along two transects in the Luquillo Mountains. One passed through multiple vegetation zones (tabonuco, palo colorado, and elfin forests), and one passed through only palm forest. We quantified variation in hierarchical partitions (α, β, and γ) of species richness, evenness, diversity, and dominance, as well as in the content and quality of litter. Total gastropod abundance linearly decreased with increasing elevation along both transects, but was consistently higher in palm than in other forest types. The gradual linear decline in γ‐richness was a consequence of opposing patterns with regard to α‐richness (monotonic decrease) and β‐richness (monotonic increase). For evenness, diversity, and dominance, α‐partitions and γ‐partitions evinced mid‐elevational peaks. The spatial organization of gastropod biodiversity did not mirror the zonation of vegetation. Rather, it was molded by: (a) elevational variation in productivity or nutrient characteristics, (b) the interspersion of palm forest within other forest types, and (c) the cloud condensation point acting as a transition between low and high elevation faunas. Abstract in Spanish is available with online material.more » « less
-
Abstract Stability of eukaryotic mRNAs is associated with their codon, amino acid, and GC content. Yet, coding sequence motifs that predictably alter mRNA stability in human cells remain poorly defined. Here, we develop a massively parallel assay to measure mRNA effects of thousands of synthetic and endogenous coding sequence motifs in human cells. We identify several families of simple dipeptide repeats whose translation triggers mRNA destabilization. Rather than individual amino acids, specific combinations of bulky and positively charged amino acids are critical for the destabilizing effects of dipeptide repeats. Remarkably, dipeptide sequences that form extended β strands in silico and in vitro slowdown ribosomes and reduce mRNA levels in vivo. The resulting nascent peptide code underlies the mRNA effects of hundreds of endogenous peptide sequences in the human proteome. Our work suggests an intrinsic role for the ribosome as a selectivity filter against the synthesis of bulky and aggregation-prone peptides.more » « less
An official website of the United States government

