- Award ID(s):
- 1934846
- NSF-PAR ID:
- 10357958
- Editor(s):
- Segata, Nicola
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 3
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1009273
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
As transposon sequencing (TnSeq) assays have become prolific in the microbiology field, it is of interest to scrutinize their potential drawbacks. TnSeq data consist of millions of nucleotide sequence reads that are generated by PCR amplification of transposon-genomic junctions. Reads mapping to the junctions are enumerated thus providing information on the number of transposon insertion mutations in each individual gene. Here we explore the possibility that PCR amplification of transposon insertions in a TnSeq library skews the results by introducing bias into the detection and/or enumeration of insertions. We compared the detection and frequency of mapped insertions when altering the number of PCR cycles, and when including a nested PCR, in the enrichment step. Additionally, we present nCATRAs - a novel, amplification-free TnSeq method where the insertions are enriched via CRISPR/Cas9-targeted transposon cleavage and subsequent Oxford Nanopore MinION sequencing. nCATRAs achieved 54 and 23% enrichment of the transposons and transposon-genomic junctions, respectively, over background genomic DNA. These PCR-based and PCR-free experiments demonstrate that, overall, PCR amplification does not significantly bias the results of TnSeq insofar as insertions in the majority of genes represented in our library were similarly detected regardless of PCR cycle number and whether or not PCR amplification was employed. However, the detection of a small subset of genes which had been previously described as essential is sensitive to the number of PCR cycles. We conclude that PCR-based enrichment of transposon insertions in a TnSeq assay is reliable, but researchers interested in profiling putative essential genes should carefully weigh the number of amplification cycles employed in their library preparation protocols. In addition, nCATRAs is comparable to traditional PCR-based methods (Kendall’s correlation=0.896–0.897) although the latter remain superior owing to their accessibility and high sequencing depth.more » « less
-
Abstract Bacterial genomes evolve in complex ecosystems and are best understood in this natural context, but replicating such conditions in the lab is challenging. We used transposon sequencing to define the fitness consequences of gene disruption in the bacterium Caulobacter crescentus grown in natural freshwater, compared with axenic growth in common laboratory media. Gene disruptions in amino-acid and nucleotide sugar biosynthesis pathways and in metabolic substrate transport machinery impaired fitness in both lake water and defined minimal medium relative to complex peptone broth. Fitness in lake water was enhanced by insertions in genes required for flagellum biosynthesis and reduced by insertions in genes involved in biosynthesis of the holdfast surface adhesin. We further uncovered numerous hypothetical and uncharacterized genes for which disruption impaired fitness in lake water, defined minimal medium, or both. At the genome scale, the fitness profile of mutants cultivated in lake water was more similar to that in complex peptone broth than in defined minimal medium. Microfiltration of lake water did not significantly affect the terminal cell density or the fitness profile of the transposon mutant pool, suggesting that Caulobacter does not strongly interact with other microbes in this ecosystem on the measured timescale. Fitness of select mutants with defects in cell surface biosynthesis and environmental sensing were significantly more variable across days in lake water than in defined medium, presumably owing to day-to-day heterogeneity in the lake environment. This study reveals genetic interactions between Caulobacter and a natural freshwater environment, and provides a new avenue to study gene function in complex ecosystems.
-
Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance -- are robust to choice of network, with more representative networks leading to greater performance gains.more » « less
-
ABSTRACT Global transposon mutagenesis is a valuable tool for identifying genes required for cell viability. Here we present a global analysis of the orientation of viable Tn 5 -Puro r (Tn 5 -puromycin resistance) insertions into the near-minimal bacterial genome of JCVI-syn2.0. Sixteen of the 478 protein-coding genes show a noticeable asymmetry in the orientation of disrupting insertions of Tn 5 -Puro r . Ten of these are located in operons, upstream of essential or quasi-essential genes. Inserts transcribed in the same direction as the downstream gene are favored, permitting read-through transcription of the essential or quasi-essential gene. Some of these genes were classified as quasi-essential solely because of polar effects on the expression of downstream genes. Three genes showing asymmetry in Tn 5 -Puro r insertion orientation prefer the orientation that avoids collisions between read-through transcription of Tn 5 -Puro r and transcription of an adjacent gene. One gene (JCVISYN2_0132 [abbreviated here as “_0132”]) shows a strong preference for Tn 5 -Puro r insertions transcribed upstream, away from the downstream nonessential gene _0133. This suggested that expression of _0133 due to read-through from Tn 5 -Puro r is lethal when _0132 function is disrupted by transposon insertion. This led to the identification of genes _0133 and _0132 as a toxin-antitoxin pair. The three remaining genes show read-through transcription of Tn 5 -Puro r directed downstream and away from sizable upstream intergenic regions (199 bp to 363 bp), for unknown reasons. In summary, polar effects of transposon insertion can, in a few cases, affect the classification of genes as essential, quasi-essential, or nonessential and sometimes can give clues to gene function. IMPORTANCE In studies of the minimal genetic requirements for life, we used global transposon mutagenesis to identify genes needed for a minimal bacterial genome. Transposon insertion can disrupt the function of a gene but can also have polar effects on the expression of adjacent genes. In the Tn 5 -Puro r construct used in our studies, read-through transcription from Tn 5 -Puro r can drive expression of downstream genes. This results in a preference for Tn 5 -Puro r insertions transcribed toward a downstream essential or quasi-essential gene within the same operon. Such polar effects can have an impact on the classification of genes as essential, quasi-essential, or nonessential, but this has been observed in only a few cases. Also, polar effects of Tn 5 -Puro r insertion can sometimes give clues to gene function.more » « less
-
Abstract Summary With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment.
Supplementary information Supplementary data are available at Bioinformatics online.