skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Signal Peptides Generated by Attention-Based Neural Networks
Short (15–30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial Bacillus subtilis strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.  more » « less
Award ID(s):
1937902
PAR ID:
10179180
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
ACS Synthetic Biology
ISSN:
2161-5063
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Small proteins (SPs) are typically characterized as eukaryotic proteins shorter than 100 amino acids and prokaryotic proteins shorter than 50 amino acids. Historically, they were disregarded because of the arbitrary size thresholds to define proteins. However, recent research has revealed the existence of many SPs and their crucial roles. Despite this, the identification of SPs and the elucidation of their functions are still in their infancy. To pave the way for future SP studies, we briefly introduce the limitations and advancements in experimental techniques for SP identification. We then provide an overview of available computational tools for SP identification, their constraints, and their evaluation. Additionally, we highlight existing resources for SP research. This survey aims to initiate further exploration into SPs and encourage the development of more sophisticated computational tools for SP identification in prokaryotes and microbiomes. 
    more » « less
  2. Abstract The intricate landscape of tRNA modification presents persistent analytical challenges, which have impeded efforts to simultaneously resolve sequence, modification, and aminoacylation state at the level of individual tRNAs. To address these challenges, we introduce “aa-tRNA-seq”, an integrated method that uses chemical ligation to sandwich the amino acid of a charged tRNA in between the body of the tRNA and an adaptor oligonucleotide, followed by high throughput nanopore sequencing. Our approach reveals the identity of the amino acids attached to all tRNAs in a cellular sample, at the single molecule level. We describe machine learning models that enable the accurate identification of amino acid identities based on the unique signal distortions generated by the interactions between the amino acid in the RNA backbone and the nanopore motor protein and reader head. We apply aa-tRNA-seq to characterize the impact of the loss of specific tRNA modification enzymes, confirming the hypomodification-associated instability of specific tRNAs, and identifying additional candidate targets of modification. Our studies lay the groundwork for understanding the efficiency and fidelity of tRNA aminoacylation as a function of tRNA sequence, modification, and environmental conditions. 
    more » « less
  3. Agrobacterium effector protein VirE2 is important for plant transformation. VirE2 likely coats transferred DNA (T-DNA) in the plant cell and protects it from degradation. VirE2 localizes to the plant cytoplasm and interacts with several host proteins. Plant-expressed VirE2 can complement a virE2 mutant Agrobacterium strain to support transformation. We investigated whether VirE2 could facilitate transformation from a nuclear location by affixing to it a strong nuclear localization signal (NLS) sequence. Only cytoplasmic-, but not nuclear-localized, VirE2 could stimulate transformation. To investigate the ways VirE2 supports transformation, we generated transgenic Arabidopsis plants containing a virE2 gene under the control of an inducible promoter and performed RNA-seq and proteomic analyses before and after induction. Some differentially expressed plant genes were previously known to facilitate transformation. Knockout mutant lines of some other VirE2 differentially expressed genes showed altered transformation phenotypes. Levels of some proteins known to be important for transformation increased in response to VirE2 induction, but prior to or without induction of their corresponding mRNAs. Overexpression of some other genes whose proteins increased after VirE2 induction resulted in increased transformation susceptibility. We conclude that cytoplasmically localized VirE2 modulates both plant RNA and protein levels to facilitate transformation. 
    more » « less
  4. Abstract Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such proteins is challenging due to the complexity of protein–ligand interactions, the flexibility of ligand molecules and amino acid side chains, and sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence and atomic structure of the protein regions in which ligand interactions occur. PocketGen promotes consistency between protein sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The graph transformer captures interactions at multiple scales, including atom, residue and ligand levels. For sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with enhanced binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 97% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 63%. 
    more » « less
  5. Introduction: Some proteins, including yeast prion protein Sup35 (eRF3) are capable of both stress-induced liquid-liquid phase separation (LLPS) and formation of prion state, propagated via solid fibrous aggregates (amyloids). Relationships between these processes are still poorly understood. Previous literature data suggested that prion formation by Sup35 is sporadically distributed in fungal evolution and depends on amino acid composition of its prion domain (PrD), rather than on a specific sequence which is highly variable. Objectives: Identify sequence patterns that control LLPS and amyloid formation by Sup35 PrD, and trace their conservation in fungal evolution. Methods: Fungal Sup35 PrDs of various evolutionary origins, as well as artificially synthesized “scrambled” variants of Saccharomyces cerevisiae Sup35 PrD, having identical amino acid composition but different sequences, were fused to fluorophores and expressed in S. cerevisiae cells. LLPS and amyloid/prion formation were assessed by fluorescence microscopy and biochemical approaches. Amino acid sequences were analyzed by various computational algorithms. Results/Discussion: While propagation of prion state depends on evolutionary distance from the host, both LLPS and ability to form an amyloid are associated with specific patterns of PrD amino acid distribution, that are broadly conserved among fungi. PrDs of different origins are capable of colocalizing within liquid condensates and influencing amyloid conversion by each other. Conclusion: LLPS and amyloid properties depend on specific evolutionarily conserved sequence patterns, indicating possible important biological roles for these processes. These patterns could potentially be used to predict LLPS and prion potential in other sequence contexts. Funding: NSF grant 2345660 
    more » « less