skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving the Accuracy of Bulk Fitness Assays by Correcting Barcode Processing Biases
Measuring the fitnesses of genetic variants is a fundamental objective in evolutionary biology. A standard approach for measuring microbial fitnesses in bulk involves labeling a library of genetic variants with unique sequence barcodes, competing the labeled strains in batch culture, and using deep sequencing to track changes in the barcode abundances over time. However, idiosyncratic properties of barcodes can induce nonuniform amplification or uneven sequencing coverage that causes some barcodes to be over- or under-represented in samples. This systematic bias can result in erroneous read count trajectories and misestimates of fitness. Here, we develop a computational method, named REBAR (Removing the Effects of Bias through Analysis of Residuals), for inferring the effects of barcode processing bias by leveraging the structure of systematic deviations in the data. We illustrate this approach by applying it to two independent data sets, and demonstrate that this method estimates and corrects for bias more accurately than standard proxies, such as GC-based corrections. REBAR mitigates bias and improves fitness estimates in high-throughput assays without introducing additional complexity to the experimental protocols, with potential applications in a range of experimental evolution and mutation screening contexts.  more » « less
Award ID(s):
2310746
PAR ID:
10581087
Author(s) / Creator(s):
; ; ;
Editor(s):
Harris, Kelley
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
41
Issue:
8
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Random DNA barcodes are a versatile tool for tracking cell lineages, with applications ranging from development to cancer to evolution. Here, we review and critically evaluate barcode designs as well as methods of barcode sequencing and initial processing of barcode data. We first demonstrate how various barcode design decisions affect data quality and propose a new design that balances all considerations that we are currently aware of. We then discuss various options for the preparation of barcode sequencing libraries, including inline indices and Unique Molecular Identifiers (UMIs). Finally, we test the performance of several established and new bioinformatic pipelines for the extraction of barcodes from raw sequencing reads and for error correction. We find that both alignment and regular expression-based approaches work well for barcode extraction, and that error-correction pipelines designed specifically for barcode data are superior to generic ones. Overall, this review will help researchers to approach their barcoding experiments in a deliberate and systematic way. 
    more » « less
  2. Given the need to predict the outcomes of (co)evolution in host-associated microbiomes, whether microbial and host fitnesses tend to trade-off, generating conflict, remains a pressing question. Examining the relationships between host and microbe fitness proxies at both the phenotypic and genomic levels can illuminate the mechanisms underlying interspecies cooperation and conflict. We examined naturally occurring genetic variation in 191 strains of the model microbial symbiont Sinorhizobium meliloti , paired with each of two host Medicago truncatula genotypes in single- or multi-strain experiments to determine how multiple proxies of microbial and host fitness were related to one another and test key predictions about mutualism evolution at the genomic scale, while also addressing the challenge of measuring microbial fitness. We found little evidence for interspecies fitness conflict; loci tended to have concordant effects on both microbe and host fitnesses, even in environments with multiple co-occurring strains. Our results emphasize the importance of quantifying microbial relative fitness for understanding microbiome evolution and thus harnessing microbiomes to improve host fitness. Additionally, we find that mutualistic coevolution between hosts and microbes acts to maintain, rather than erode, genetic diversity, potentially explaining why variation in mutualism traits persists in nature. 
    more » « less
  3. Knowledge of the distribution of fitness effects (DFE) of mutations is critical to the understanding of protein evolution. Here, we describe methods for large-scale, systematic measurements of the DFE using growth competition and deep mutational scanning. We discuss techniques for producing comprehensive libraries of gene variants as well as provide necessary considerations for designing these experiments. Using these methods, we have constructed libraries containing over 18,000 variants, measured fitness effects of these mutations by deep mutational scanning, and verified the presence of fitness effects in individual variants. Our methods provide a high-throughput protocol for measuring biological fitness effects of mutations and the dependence of fitness effects on the environment. 
    more » « less
  4. Townsend, Jeffrey (Ed.)
    Abstract Genetic variation is the raw material upon which selection acts. The majority of environmental conditions change over time and therefore may result in variable selective effects. How temporally fluctuating environments impact the distribution of fitness effects and in turn population diversity is an unresolved question in evolutionary biology. Here, we employed continuous culturing using chemostats to establish environments that switch periodically between different nutrient limitations and compared the dynamics of selection to static conditions. We used the pooled Saccharomyces cerevisiae haploid gene deletion collection as a synthetic model for populations comprising thousands of unique genotypes. Using barcode sequencing, we find that static environments are uniquely characterized by a small number of high-fitness genotypes that rapidly dominate the population leading to dramatic decreases in genetic diversity. By contrast, fluctuating environments are enriched in genotypes with neutral fitness effects and an absence of extreme fitness genotypes contributing to the maintenance of genetic diversity. We also identified a unique class of genotypes whose frequencies oscillate sinusoidally with a period matching the environmental fluctuation. Oscillatory behavior corresponds to large differences in short-term fitness that are not observed across long timescales pointing to the importance of balancing selection in maintaining genetic diversity in fluctuating environments. Our results are consistent with a high degree of environmental specificity in the distribution of fitness effects and the combined effects of reduced and balancing selection in maintaining genetic diversity in the presence of variable selection. 
    more » « less
  5. de Visser, J. Arjan (Ed.)
    The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood–free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright–Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright–Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10 −4.7 to 10 −4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods—barcode lineage tracking and pairwise fitness assays—which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network–based likelihood–free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution. 
    more » « less