This content will become publicly available on March 7, 2023
- Editors:
- Segata, Nicola
- Award ID(s):
- 1934846
- Publication Date:
- NSF-PAR ID:
- 10357958
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 3
- Page Range or eLocation-ID:
- e1009273
- ISSN:
- 1553-7358
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Background Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis. Results Here we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such asmore »
-
Diversification of Type VI Secretion System Toxins Reveals Ancient Antagonism among Bee Gut MicrobesABSTRACT Microbial communities are shaped by interactions among their constituent members. Some Gram-negative bacteria employ type VI secretion systems (T6SSs) to inject protein toxins into neighboring cells. These interactions have been theorized to affect the composition of host-associated microbiomes, but the role of T6SSs in the evolution of gut communities is not well understood. We report the discovery of two T6SSs and numerous T6SS-associated Rhs toxins within the gut bacteria of honey bees and bumble bees. We sequenced the genomes of 28 strains of Snodgrassella alvi , a characteristic bee gut microbe, and found tremendous variability in their Rhs toxin complements: altogether, these strains appear to encode hundreds of unique toxins. Some toxins are shared with Gilliamella apicola , a coresident gut symbiont, implicating horizontal gene transfer as a source of toxin diversity in the bee gut. We use data from a transposon mutagenesis screen to identify toxins with antibacterial function in the bee gut and validate the function and specificity of a subset of these toxin and immunity genes in Escherichia coli . Using transcriptome sequencing, we demonstrate that S. alvi T6SSs and associated toxins are upregulated in the gut environment. We find that S. alvi Rhs loci have a conservedmore »
-
Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance --more »
-
Abstract Background Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions.
Results We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses.
Conclusion We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensionalmore »
-
Abstract Summary With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performancemore »