Native chemical ligation (NCL) at proline has been limited by cost and synthetic access. In addition, prior examples of NCL using mercaptoproline have exhibited stalling of the reaction after thioester exchange, due to inefficient SN acyl transfer. Herein, we develop methods, using inexpensive Boc-4R-hydroxyproline, for the solid-phase synthesis of peptides containing N-terminal 4R-mercaptoproline and 4R-selenoproline. The synthesis proceeds via proline editing on the N-terminus of fully synthesized peptides on the solid phase, converting an N-terminal Boc-4R-hydroxyproline to the 4S-bromoproline, followed by SN2 reaction with potassium thioacetate or selenobenzoic acid. After cleavage from the resin and deprotection, peptides with functionalized N-terminal proline amino acids were obtained. NCL reactions with mercaptoproline proceeded slowly under standard NCL conditions, with the S-acyl transthioesterification intermediate observed as a major species. Computational investigations indicated that the bicyclic intermediates and transition states for SN acyl transfer are sufficiently low in energy (10-15 kcal mol–1 above starting material) that ring strain cannot explain slow SN acyl transfer. Instead, the bicyclic zwitterionic tetrahedral intermediate has a low barrier for reversion to the S-acyl intermediate, causing reversion to the thioester (reverse reaction) to occur preferentially over elimination to generate the amide (forward reaction). We hypothesized that a buffer capable of general acid and/or general base catalysis could promote SN acyl transfer, and thus achieve greater efficiency in proline NCL. In the presence of 2 M imidazole at pH 6.8, NCL with mercaptoproline proceeded efficiently to generate the peptide with a native amide bond. NCL with selenoproline also proceeded efficiently to generate the desired products when a thiophenol thioester was employed as a ligation partner. After desulfurization or deselenization, the products obtained were identical to those synthesized directly, confirming that the solid-phase proline editing reactions proceeded stereospecifically and without epimerization.
more »
« less
Acyltransferase families that act on thioesters: Sequences, structures, and mechanisms
Abstract Acyltransferases (AT) are enzymes that catalyze the transfer of acyl group to a receptor molecule. This review focuses on ATs that act on thioester‐containing substrates. Although many ATs can recognize a wide variety of substrates, sequence similarity analysis allowed us to classify the ATs into fifteen distinct families. Each AT family is originated from enzymes experimentally characterized to have AT activity, classified according to sequence similarity, and confirmed with tertiary structure similarity for families that have crystallized structures available. All the sequences and structures of the AT families described here are present in the thioester‐active enzyme (ThYme) database. The AT sequences and structures classified into families and available in the ThYme database could contribute to enlightening the understanding acyl transfer to thioester‐containing substrates, most commonly coenzyme A, which occur in multiple metabolic pathways, mostly with fatty acids.
more »
« less
- Award ID(s):
- 2001385
- PAR ID:
- 10485710
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Proteins: Structure, Function, and Bioinformatics
- Volume:
- 92
- Issue:
- 2
- ISSN:
- 0887-3585
- Format(s):
- Medium: X Size: p. 157-169
- Size(s):
- p. 157-169
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyze 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical, and gene neighborhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.more » « less
-
Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol productionnull (Ed.)Abstract Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.more » « less
-
null (Ed.)Abstract RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org.more » « less
-
null (Ed.)Abstract PULs (polysaccharide utilization loci) are discrete gene clusters of CAZymes (Carbohydrate Active EnZymes) and other genes that work together to digest and utilize carbohydrate substrates. While PULs have been extensively characterized in Bacteroidetes, there exist PULs from other bacterial phyla, as well as archaea and metagenomes, that remain to be catalogued in a database for efficient retrieval. We have developed an online database dbCAN-PUL (http://bcb.unl.edu/dbCAN_PUL/) to display experimentally verified CAZyme-containing PULs from literature with pertinent metadata, sequences, and annotation. Compared to other online CAZyme and PUL resources, dbCAN-PUL has the following new features: (i) Batch download of PUL data by target substrate, species/genome, genus, or experimental characterization method; (ii) Annotation for each PUL that displays associated metadata such as substrate(s), experimental characterization method(s) and protein sequence information, (iii) Links to external annotation pages for CAZymes (CAZy), transporters (UniProt) and other genes, (iv) Display of homologous gene clusters in GenBank sequences via integrated MultiGeneBlast tool and (v) An integrated BLASTX service available for users to query their sequences against PUL proteins in dbCAN-PUL. With these features, dbCAN-PUL will be an important repository for CAZyme and PUL research, complementing our other web servers and databases (dbCAN2, dbCAN-seq).more » « less
An official website of the United States government
