Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
de Visser, J. Arjan (Ed.)The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood–free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright–Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright–Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10 −4.7 to 10 −4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods—barcode lineage tracking and pairwise fitness assays—which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network–based likelihood–free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution.more » « less
-
Townsend, Jeffrey (Ed.)Abstract Genetic variation is the raw material upon which selection acts. The majority of environmental conditions change over time and therefore may result in variable selective effects. How temporally fluctuating environments impact the distribution of fitness effects and in turn population diversity is an unresolved question in evolutionary biology. Here, we employed continuous culturing using chemostats to establish environments that switch periodically between different nutrient limitations and compared the dynamics of selection to static conditions. We used the pooled Saccharomyces cerevisiae haploid gene deletion collection as a synthetic model for populations comprising thousands of unique genotypes. Using barcode sequencing, we find that static environments are uniquely characterized by a small number of high-fitness genotypes that rapidly dominate the population leading to dramatic decreases in genetic diversity. By contrast, fluctuating environments are enriched in genotypes with neutral fitness effects and an absence of extreme fitness genotypes contributing to the maintenance of genetic diversity. We also identified a unique class of genotypes whose frequencies oscillate sinusoidally with a period matching the environmental fluctuation. Oscillatory behavior corresponds to large differences in short-term fitness that are not observed across long timescales pointing to the importance of balancing selection in maintaining genetic diversity in fluctuating environments. Our results are consistent with a high degree of environmental specificity in the distribution of fitness effects and the combined effects of reduced and balancing selection in maintaining genetic diversity in the presence of variable selection.more » « less
-
null (Ed.)Abstract. Mangrove forests are ecosystems that constitute a large portion of the world's coastline and span tidal zones below, between, and above thewaterline, and the ecosystem as a whole is defined by the health of these tidal microhabitats. However, we are only beginning to understand tidal-zone microbial biodiversity and the role of these microbiomes in nutrient cycling. While extensive research has characterized microbiomes inpristine vs. anthropogenically impacted mangroves, these have, largely, overlooked differences in tidal microhabitats (sublittoral, intertidal, andsupralittoral). Unfortunately, the small number of studies that have sought to characterize mangrove tidal zones have occurred in impacted biomes,making interpretation of the results difficult. Here, we characterized prokaryotic populations and their involvement in nutrient cycling across thetidal zones of a pristine mangrove within a Brazilian Environmental Protection Area of the Atlantic Forest. We hypothesized that the tidal zones inpristine mangroves are distinct microhabitats, which we defined as distinct regions that present spatial variations in the water regime and otherenvironmental factors, and as such, these are composed of different prokaryotic communities with distinct functional profiles. Samples werecollected in triplicate from zones below, between, and above the tidal waterline. Using 16S ribosomal RNA (rRNA) gene amplicon sequencing, we found distinctprokaryotic communities with significantly diverse nutrient-cycling functions, as well as specific taxa with varying contributions to functionalabundances between zones. Where previous research from anthropogenically impacted mangroves found the intertidal zone to have high prokaryoticdiversity and be functionally enriched in nitrogen cycling, we find that the intertidal zone from pristine mangroves has the lowest diversity and nofunctional enrichment, relative to the other tidal zones. The main bacterial phyla in all samples were Firmicutes, Proteobacteria,and Chloroflexi while the main archaeal phyla were Crenarchaeota and Thaumarchaeota. Our results differ slightly fromother studies where Proteobacteria is the main phyla in mangrove sediments and Firmicutes makes up only a small percentage ofthe communities. Salinity and organic matter were the most relevant environmental factors influencing these communities. Bacillaceae wasthe most abundant family at each tidal zone and showed potential to drive a large proportion of the cycling of carbon, nitrogen, phosphorus, andsulfur. Our findings suggest that some aspects of mangrove tidal zonation may be compromised by human activity, especially in the intertidal zone.more » « less
-
Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.more » « less
-
Abstract Motivation Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above.
Results In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data.
Availability and implementation The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/).
Supplementary information Supplementary data are available at Bioinformatics online.
-
Organisms switch their genes on and off to adapt to changing environments. This takes place thanks to complex networks of regulators that control which genes are actively ‘read’ by the cell to create the RNA molecules that are needed at the time. Piecing together these networks is key to fully understand the inner workings of living organisms, and how to potentially modify or artificially create them. Single-cell RNA sequencing is a powerful new tool that can measure which genes are turned on (or ‘expressed’) in an individual cell. Datasets with millions of gene expression profiles for individual cells now exist for organisms such as mice or humans. Yet, it is difficult to use these data to reconstruct networks of regulators; this is partly because scientists are not sure if the computational methods normally used to build these networks also work for single-cell RNA sequencing data. One way to check if this is the case is to use the methods on single-cell datasets from organisms where the networks of regulators are already known, and check whether the computational tools help to reach the same conclusion. Unfortunately, the regulatory networks in the organisms for which scientists have a lot of single-cell RNA sequencing data are still poorly known. There are living beings in which the networks are well characterised – such as yeast – but it has been difficult to do single-cell sequencing in them at the scale seen in other organisms. Jackson, Castro et al. first adapted a system for single-cell sequencing so that it would work in yeast. This generated a gene expression dataset of over 40,000 yeast cells. They then used a computational method (called the Inferelator) on these data to construct networks of regulators, and the results showed that the method performed well. This allowed Jackson, Castro et al. to start mapping how different networks connect, for example those that control the response to the environment and cell division. This is one of the benefits of single-cell RNA methods: cell division for example is not a process that can be examined at the level of a population, since the cells may all be at different life stages. In the future, the dataset will also be useful to scientists to benchmark a variety of single cell computational tools.more » « less
-
Copy number variants (CNVs) are regions of the genome that vary in integer copy number. CNVs, which comprise both amplifications and deletions of DNA sequence, have been identified across all domains of life, from bacteria and archaea to plants and animals. CNVs are an important source of genetic diversity, and can drive rapid adaptive evolution and progression of heritable and somatic human diseases, such as cancer. However, despite their evolutionary importance and clinical relevance, CNVs remain understudied compared to single-nucleotide variants (SNVs). This is a consequence of the inherent difficulties in detecting CNVs at low-to-intermediate frequencies in heterogeneous populations of cells. Here, we discuss molecular methods used to detect CNVs, the limitations associated with using these techniques, and the application of new and emerging technologies that present solutions to these challenges. The goal of this short review and perspective is to highlight aspects of CNV biology that are understudied and define avenues for further research that address specific gaps in our knowledge of these complex alleles. We describe our recently developed method for CNV detection in which a fluorescent gene functions as a single-cell CNV reporter and present key findings from our evolution experiments in Saccharomyces cerevisiae. Using a CNV reporter, we found that CNVs are generated at a high rate and undergo selection with predictable dynamics across independently evolving replicate populations. Many CNVs appear to be generated through DNA replication-based processes that are mediated by the presence of short, interrupted, inverted-repeat sequences. Our results have important implications for the role of CNVs in evolutionary processes and the molecular mechanisms that underlie CNV formation. We discuss the possible extension of our method to other applications, including tracking the dynamics of CNVs in models of human tumors.more » « less