skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep Video Analysis for Bacteria Genotype Prediction
Abstract Genetic modification of microbes is central to many biotechnology fields, such as industrial microbiology, bioproduction, and drug discovery. Understanding how specific genetic modifications influence observable bacterial behaviors is crucial for advancing these fields. In this study, we propose a supervised model to classify bacteria harboring single gene modifications to draw connections between phenotype and genotype. In particular, we demonstrate that the spatiotemporal patterns ofVibrio choleraegrowth, recorded in terms of low-resolution bright-field microscopy videos, are highly predictive of the genotype class. Additionally, we introduce a weakly supervised approach to identify key moments in culture growth that significantly contribute to prediction accuracy. By focusing on the temporal expressions of bacterial behavior, our findings offer valuable insights into the underlying mechanisms and developmental stages by which specific genes control observable phenotypes. This research opens new avenues for automating the analysis of phenotypes, with potential applications for drug discovery, disease management, etc. Furthermore, this work highlights the potential of using machine learning techniques to explore the functional roles of specific genes using a low-resolution light microscope.  more » « less
Award ID(s):
2238093 2211597 2205148 2007595 1949629
PAR ID:
10612160
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Correlation among multiple phenotypes across related individuals may reflect some pattern of shared genetic architecture: individual genetic loci affect multiple phenotypes (an effect known as pleiotropy), creating observable relationships between phenotypes. A natural hypothesis is that pleiotropic effects reflect a relatively small set of common “core” cellular processes: each genetic locus affects one or a few core processes, and these core processes in turn determine the observed phenotypes. Here, we propose a method to infer such structure in genotype–phenotype data. Our approach, sparse structure discovery (SSD) is based on a penalized matrix decomposition designed to identify latent structure that is low-dimensional (many fewer core processes than phenotypes and genetic loci), locus-sparse (each locus affects few core processes), and/or phenotype-sparse (each phenotype is influenced by few core processes). Our use of sparsity as a guide in the matrix decomposition is motivated by the results of a novel empirical test indicating evidence of sparse structure in several recent genotype–phenotype datasets. First, we use synthetic data to show that our SSD approach can accurately recover core processes if each genetic locus affects few core processes or if each phenotype is affected by few core processes. Next, we apply the method to three datasets spanning adaptive mutations in yeast, genotoxin robustness assay in human cell lines, and genetic loci identified from a yeast cross, and evaluate the biological plausibility of the core process identified. More generally, we propose sparsity as a guiding prior for resolving latent structure in empirical genotype–phenotype maps. 
    more » « less
  2. Whiteson, Katrine (Ed.)
    ABSTRACT The opportunistic human pathogenPseudomonas aeruginosais naturally infected by a large class of temperate, transposable, Mu-like phages. We examined the genotypic and phenotypic diversity ofP. aeruginosaPA14 lysogen populations as they resolve clustered regularly interspaced short palindromic repeat(CRISPR) autoimmunity, mediated by an imperfect CRISPR match to the Mu-like DMS3 prophage. After 12 days of evolution, we measured a decrease in spontaneous induction in both exponential and stationary phase growth. Co-existing variation in spontaneous induction rates in the exponential phase depended on the way the coexisting strains resolved genetic conflict. Multiple mutational modes to resolve genetic conflict between host and phage resulted in coexistence in evolved populations of single lysogens that maintained CRISPR immunity to other phages and polylysogens that lost immunity completely. This work highlights a new dimension of the role of lysogenic phages in the evolution of their hosts.IMPORTANCEThe chronic opportunistic multi-drug-resistant pathogenPseudomonas aeruginosais persistently infected by temperate phages. We assess the contribution of temperate phage infection to the evolution of the clinically relevant strain UCBPP-PA14. We found that a low level of clustered regularly interspaced short palindromic repeat (CRISPR)-mediated self-targeting resulted in polylysogeny evolution and large genome rearrangements in lysogens; we also found extensive diversification in CRISPR spacers andcasgenes. These genomic modifications resulted in decreased spontaneous induction in both exponential and stationary phase growth, increasing lysogen fitness. This work shows the importance of considering latent phage infection in characterizing the evolution of bacterial populations. 
    more » « less
  3. Abstract Migration is driven by a combination of environmental and genetic factors, but many questions remain about those drivers. Potential interactions between genetic and environmental variants associated with different migratory phenotypes are rarely the focus of study. We pair low coverage whole genome resequencing with a de novo genome assembly to examine population structure, inbreeding, and the environmental factors associated with genetic differentiation between migratory and resident breeding phenotypes in a species of conservation concern, the western burrowing owl (Athene cunicularia hypugaea). Our analyses reveal a dichotomy in gene flow depending on whether the population is resident or migratory, with the former being genetically structured and the latter exhibiting no signs of structure. Among resident populations, we observed significantly higher genetic differentiation, significant isolation‐by‐distance, and significantly elevated inbreeding. Among migratory breeding groups, on the other hand, we observed lower genetic differentiation, no isolation‐by‐distance, and substantially lower inbreeding. Using genotype–environment association analysis, we find significant evidence for relationships between migratory phenotypes (i.e., migrant versus resident) and environmental variation associated with cold temperatures during the winter and barren, open habitats. In the regions of the genome most differentiated between migrants and residents, we find significant enrichment for genes associated with the metabolism of fats. This may be linked to the increased pressure on migrants to process and store fats more efficiently in preparation for and during migration. Our results provide a significant contribution toward understanding the evolution of migratory behavior and vital insight into ongoing conservation and management efforts for the western burrowing owl. 
    more » « less
  4. Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax - the wavelength of maximum absorbance - which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmaxphenotypes called the Visual Physiology Opsin Database (VPOD).VPOD_1.0contains 864 unique opsin genotypes and corresponding λmaxphenotypes collected across all animals from 73 separate publications. We useVPODdata anddeepBreaksto show regression-based machine learning (ML) models often reliably predict λmax, account for non-additive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly forde-novoprotein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. Key PointsWe introduce the Visual Physiology Opsin Database (VPOD_1.0), which includes 864 unique animal opsin genotypes and corresponding λmaxphenotypes from 73 separate publications.We demonstrate that regression-based ML models can reliably predict λmax from gene sequence alone, predict non-additive effects of mutations on function, and identify functionally critical amino acid sites.We provide an approach that lays the groundwork for future robust exploration of molecular-evolutionary patterns governing phenotype, with potential broader applications to any family of genes with quantifiable and comparable phenotypes. 
    more » « less
  5. Abstract Cryptic genetic variants exert minimal phenotypic effects alone but are hypothesized to form a vast reservoir of genetic diversity driving trait evolvability through epistatic interactions1–3. This classical theory has been reinvigorated by pan-genomics, which is revealing pervasive variation within gene families,cis-regulatory regions and regulatory networks4–6. Testing the ability of cryptic variation to fuel phenotypic diversification has been hindered by intractable genetics, limited allelic diversity and inadequate phenotypic resolution. Here, guided by natural and engineeredcis-regulatory cryptic variants in a paralogous gene pair, we identified additional redundanttransregulators, establishing a regulatory network controlling tomato inflorescence architecture. By combining coding mutations withcis-regulatory alleles in populations segregating for all four network genes, we generated 216 genotypes spanning a wide spectrum of inflorescence complexity and quantified branching in over 35,000 inflorescences. Analysis of this high-resolution genotype–phenotype map using a hierarchical model of epistasis revealed a layer of dose-dependent interactions within paralogue pairs enhancing branching, culminating in strong, synergistic effects. However, we also identified a layer of antagonism between paralogue pairs, whereby accumulating mutations in one pair progressively diminished the effects of mutations in the other. Our results demonstrate how gene regulatory network architecture and complex dosage effects from paralogue diversification converge to shape phenotypic space, producing the potential for both strongly buffered phenotypes and sudden bursts of phenotypic change. 
    more » « less