Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Henderson, Thomas ; Imputato, Pasquale ; Liu, Yuchen ; Gamess, Eric (Ed.)Physical (PHY) layer abstraction is an effective method to reduce the runtimes compared with link simulations but still accurately characterize the link performance. As a result, PHY layer abstraction for IEEE 802.11 WLAN and 3GPP LTE/5G has been widely configured in the network simulators such as ns-3, which achieve faster system-level simulations quantifying the network performance. Since the first publicly accessible 5G NR Sidelink (SL) link simulator has been recently developed, it provides a possibility of implementing the first PHY layer abstraction on 5G NR SL. This work deploys an efficient PHY layer abstraction method (i.e., EESM-log-SGN) for 5G NR SL based on the offline NR SL link simulation. The obtained layer abstraction which is further stored in ns-3 for use aims at the common 5G NR SL scenario of OFDM unicast single layer mapping in the context of Independent and Identically Distributed (i.i.d.) frequency-selective channels. We provide details about implementation, performance, and validation.more » « less
-
Abstract Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.