Two long-standing challenges in theoretical population genetics and evolution are predicting the distribution of phenotype diversity generated by mutation and available for selection, and determining the interaction of mutation, selection and drift to characterize evolutionary equilibria and dynamics. More fundamental for enabling such predictions is the current inability to causally link genotype to phenotype. There are three major mechanistic mappings required for such a linking – genetic sequence to kinetic parameters of the molecular processes, kinetic parameters to biochemical system phenotypes, and biochemical phenotypes to organismal phenotypes. This article introduces a theoretical framework, the Phenotype Design Space (PDS) framework, for addressing these challenges by focusing on the mapping of kinetic parameters to biochemical system phenotypes. It provides a quantitative theory whose key features include (1) a mathematically rigorous definition of phenotype based on biochemical kinetics, (2) enumeration of the full phenotypic repertoire, and (3) functional characterization of each phenotype independent of its context-dependent selection or fitness contributions. This framework is built on Design Space methods that relate system phenotypes to genetically determined parameters and environmentally determined variables. It also has the potential to automate prediction of phenotype-specific mutation rate constants and equilibrium distributions of phenotype diversity in microbial populations undergoing steady-state exponential growth, which provides an ideal reference to which more realistic cases can be compared. Although the framework is quite general and flexible, the details will undoubtedly differ for different functions, organisms and contexts. Here a hypothetical case study involving a small molecular system, a primordial circadian clock, is used to introduce this framework and to illustrate its use in a particular case. The framework is built on fundamental biochemical kinetics. Thus, the foundation is based on linear algebra and reasonable physical assumptions, which provide numerous opportunities for experimental testing and further elaboration to deal with complex multicellular organisms that are currently beyond its scope. The discussion provides a comparison of results from the PDS framework with those from other approaches in theoretical population genetics.
more » « less- Award ID(s):
- 1716833
- NSF-PAR ID:
- 10446894
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Journal of Molecular Evolution
- Volume:
- 91
- Issue:
- 5
- ISSN:
- 0022-2844
- Format(s):
- Medium: X Size: p. 687-710
- Size(s):
- p. 687-710
- Sponsoring Org:
- National Science Foundation
More Like this
-
The central hypothesis of the genotype–phenotype relationship is that the phenotype of a developing organism (i.e., its set of observable attributes) depends on its genome and the environment. However, as we learn more about the genetics and biochemistry of living systems, our understanding does not fully extend to the complex multiscale nature of how cells move, interact, and organize; this gap in understanding is referred to as the genotype-to-phenotype problem. The physics of soft matter sets the background on which living organisms evolved, and the cell environment is a strong determinant of cell phenotype. This inevitably leads to challenges as the full function of many genes, and the diversity of cellular behaviors cannot be assessed without wide screens of environmental conditions. Cellular mechanobiology is an emerging field that provides methodologies to understand how cells integrate chemical and physical environmental stress and signals, and how they are transduced to control cell function. Biofilm forming bacteria represent an attractive model because they are fast growing, genetically malleable and can display sophisticated self-organizing developmental behaviors similar to those found in higher organisms. Here, we propose mechanobiology as a new area of study in prokaryotic systems and describe its potential for unveiling new links between an organism's genome and phenome.more » « less
-
Variability in gene expression causes genetically identical cells to exhibit different phenotypes. One probable cause of this variability is transcriptional bursting, where the synthesis of RNA molecules randomly alternates with periods of silence in the transfer of genetic information. Yet, the molecular mechanisms behind this variability remain unclear. Experiments indicate that multiple biochemical states might be involved in the production of RNA molecules. Stimulated by these observations, we developed a theoretical framework to investigate the mechanisms of transcriptional bursting. It is based on a multi-state stochastic approach that provides a full quantitative description of the dynamic properties in the system. We found that the degree of stochastic fluctuations during transcription directly correlates with the number of biochemical states. This explains experimentally observed variability and fluctuations in the quantities of the produced RNA molecules. The procedure to estimate the number of relevant biochemical states participating in the transcription is outlined and applied for analysis of experimental results. We also developed a general dynamic phase diagram for the transcription process. The presented theoretical method clarifies physical−chemical aspects of the transcriptional bursting and presents a minimal chemical-kinetic description of the process.more » « less
-
Abstract Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.
-
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.
-
Abstract Correlation among multiple phenotypes across related individuals may reflect some pattern of shared genetic architecture: individual genetic loci affect multiple phenotypes (an effect known as pleiotropy), creating observable relationships between phenotypes. A natural hypothesis is that pleiotropic effects reflect a relatively small set of common “core” cellular processes: each genetic locus affects one or a few core processes, and these core processes in turn determine the observed phenotypes. Here, we propose a method to infer such structure in genotype–phenotype data. Our approach, sparse structure discovery (SSD) is based on a penalized matrix decomposition designed to identify latent structure that is low-dimensional (many fewer core processes than phenotypes and genetic loci), locus-sparse (each locus affects few core processes), and/or phenotype-sparse (each phenotype is influenced by few core processes). Our use of sparsity as a guide in the matrix decomposition is motivated by the results of a novel empirical test indicating evidence of sparse structure in several recent genotype–phenotype datasets. First, we use synthetic data to show that our SSD approach can accurately recover core processes if each genetic locus affects few core processes or if each phenotype is affected by few core processes. Next, we apply the method to three datasets spanning adaptive mutations in yeast, genotoxin robustness assay in human cell lines, and genetic loci identified from a yeast cross, and evaluate the biological plausibility of the core process identified. More generally, we propose sparsity as a guiding prior for resolving latent structure in empirical genotype–phenotype maps.