Synopsis A major goal of research in evolution and genetics is linking genotype to phenotype. This work could be direct, such as determining the genetic basis of a phenotype by leveraging genetic variation or divergence in a developmental, physiological, or behavioral trait. The work could also involve studying the evolutionary phenomena (e.g., reproductive isolation, adaptation, sexual dimorphism, behavior) that reveal an indirect link between genotype and a trait of interest. When the phenotype diverges across evolutionarily distinct lineages, this genotype-to-phenotype problem can be addressed using phylogenetic genotype-to-phenotype (PhyloG2P) mapping, which uses genetic signatures and convergent phenotypes on a phylogeny to infer the genetic bases of traits. The PhyloG2P approach has proven powerful in revealing key genetic changes associated with diverse traits, including the mammalian transition to marine environments and transitions between major mechanisms of photosynthesis. However, there are several intermediate traits layered in between genotype and the phenotype of interest, including but not limited to transcriptional profiles, chromatin states, protein abundances, structures, modifications, metabolites, and physiological parameters. Each intermediate trait is interesting and informative in its own right, but synthesis across data types has great promise for providing a deep, integrated, and predictive understanding of how genotypes drive phenotypic differences and convergence. We argue that an expanded PhyloG2P framework (the PhyloG2P matrix) that explicitly considers intermediate traits, and imputes those that are prohibitive to obtain, will allow a better mechanistic understanding of any trait of interest. This approach provides a proxy for functional validation and mechanistic understanding in organisms where laboratory manipulation is impractical.
more »
« less
Phenotype Design Space Provides a Mechanistic Framework Relating Molecular Parameters to Phenotype Diversity Available for Selection
Abstract Two long-standing challenges in theoretical population genetics and evolution are predicting the distribution of phenotype diversity generated by mutation and available for selection, and determining the interaction of mutation, selection and drift to characterize evolutionary equilibria and dynamics. More fundamental for enabling such predictions is the current inability to causally link genotype to phenotype. There are three major mechanistic mappings required for such a linking – genetic sequence to kinetic parameters of the molecular processes, kinetic parameters to biochemical system phenotypes, and biochemical phenotypes to organismal phenotypes. This article introduces a theoretical framework, the Phenotype Design Space (PDS) framework, for addressing these challenges by focusing on the mapping of kinetic parameters to biochemical system phenotypes. It provides a quantitative theory whose key features include (1) a mathematically rigorous definition of phenotype based on biochemical kinetics, (2) enumeration of the full phenotypic repertoire, and (3) functional characterization of each phenotype independent of its context-dependent selection or fitness contributions. This framework is built on Design Space methods that relate system phenotypes to genetically determined parameters and environmentally determined variables. It also has the potential to automate prediction of phenotype-specific mutation rate constants and equilibrium distributions of phenotype diversity in microbial populations undergoing steady-state exponential growth, which provides an ideal reference to which more realistic cases can be compared. Although the framework is quite general and flexible, the details will undoubtedly differ for different functions, organisms and contexts. Here a hypothetical case study involving a small molecular system, a primordial circadian clock, is used to introduce this framework and to illustrate its use in a particular case. The framework is built on fundamental biochemical kinetics. Thus, the foundation is based on linear algebra and reasonable physical assumptions, which provide numerous opportunities for experimental testing and further elaboration to deal with complex multicellular organisms that are currently beyond its scope. The discussion provides a comparison of results from the PDS framework with those from other approaches in theoretical population genetics.
more »
« less
- Award ID(s):
- 1716833
- PAR ID:
- 10446894
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Journal of Molecular Evolution
- Volume:
- 91
- Issue:
- 5
- ISSN:
- 0022-2844
- Format(s):
- Medium: X Size: p. 687-710
- Size(s):
- p. 687-710
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The central hypothesis of the genotype–phenotype relationship is that the phenotype of a developing organism (i.e., its set of observable attributes) depends on its genome and the environment. However, as we learn more about the genetics and biochemistry of living systems, our understanding does not fully extend to the complex multiscale nature of how cells move, interact, and organize; this gap in understanding is referred to as the genotype-to-phenotype problem. The physics of soft matter sets the background on which living organisms evolved, and the cell environment is a strong determinant of cell phenotype. This inevitably leads to challenges as the full function of many genes, and the diversity of cellular behaviors cannot be assessed without wide screens of environmental conditions. Cellular mechanobiology is an emerging field that provides methodologies to understand how cells integrate chemical and physical environmental stress and signals, and how they are transduced to control cell function. Biofilm forming bacteria represent an attractive model because they are fast growing, genetically malleable and can display sophisticated self-organizing developmental behaviors similar to those found in higher organisms. Here, we propose mechanobiology as a new area of study in prokaryotic systems and describe its potential for unveiling new links between an organism's genome and phenome.more » « less
-
Multi-omics data offers rich insights into complex traits across organisms, yet integrating and analyzing these datasets for phenotype prediction and marker discovery remains challenging. Researchers need accessible tools that combine deep learning, hyperparameter optimization, visualization, and downstream analysis in a unified web platform. To address this, we developed G2PDeep-v2, a web-based platform powered by deep learning for phenotype prediction and marker discovery from multi-omics data across a wide range of organisms, including humans and plants. The server provides multiple services for researchers to create deep-learning models through an interactive interface and train these models using an automated hyperparameter tuning algorithm on high-performance computing resources. Users can visualize the results of phenotype and markers predictions and perform Gene Set Enrichment Analysis for the significant markers to provide insights into the molecular mechanisms underlying complex diseases, conditions and other biological phenotypes being studied.more » « less
-
Variability in gene expression causes genetically identical cells to exhibit different phenotypes. One probable cause of this variability is transcriptional bursting, where the synthesis of RNA molecules randomly alternates with periods of silence in the transfer of genetic information. Yet, the molecular mechanisms behind this variability remain unclear. Experiments indicate that multiple biochemical states might be involved in the production of RNA molecules. Stimulated by these observations, we developed a theoretical framework to investigate the mechanisms of transcriptional bursting. It is based on a multi-state stochastic approach that provides a full quantitative description of the dynamic properties in the system. We found that the degree of stochastic fluctuations during transcription directly correlates with the number of biochemical states. This explains experimentally observed variability and fluctuations in the quantities of the produced RNA molecules. The procedure to estimate the number of relevant biochemical states participating in the transcription is outlined and applied for analysis of experimental results. We also developed a general dynamic phase diagram for the transcription process. The presented theoretical method clarifies physical−chemical aspects of the transcriptional bursting and presents a minimal chemical-kinetic description of the process.more » « less
-
Abstract Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.more » « less
An official website of the United States government
