skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD)
Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax - the wavelength of maximum absorbance - which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmaxphenotypes called the Visual Physiology Opsin Database (VPOD).VPOD_1.0contains 864 unique opsin genotypes and corresponding λmaxphenotypes collected across all animals from 73 separate publications. We useVPODdata anddeepBreaksto show regression-based machine learning (ML) models often reliably predict λmax, account for non-additive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly forde-novoprotein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. Key PointsWe introduce the Visual Physiology Opsin Database (VPOD_1.0), which includes 864 unique animal opsin genotypes and corresponding λmaxphenotypes from 73 separate publications.We demonstrate that regression-based ML models can reliably predict λmax from gene sequence alone, predict non-additive effects of mutations on function, and identify functionally critical amino acid sites.We provide an approach that lays the groundwork for future robust exploration of molecular-evolutionary patterns governing phenotype, with potential broader applications to any family of genes with quantifiable and comparable phenotypes.  more » « less
Award ID(s):
2109688
PAR ID:
10540242
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    For many species, vision is one of the most important sensory modalities for mediating essential tasks that include navigation, predation and foraging, predator avoidance, and numerous social behaviors. The vertebrate visual process begins when photons of the light interact with rod and cone photoreceptors that are present in the neural retina. Vertebrate visual photopigments are housed within these photoreceptor cells and are sensitive to a wide range of wavelengths that peak within the light spectrum, the latter of which is a function of the type of chromophore used and how it interacts with specific amino acid residues found within the opsin protein sequence. Minor differences in the amino acid sequences of the opsins are known to lead to large differences in the spectral peak of absorbance (i.e. the λmax value). In our prior studies, we developed a new approach that combined homology modeling and molecular dynamics simulations to gather structural information associated with chromophore conformation, then used it to generate statistical models for the accurate prediction of λmax values for photopigments derived from Rh1 and Rh2 amino acid sequences. In the present study, we test our novel approach to predict the λmax of phylogenetically distant Sws2 cone opsins. To build a model that can predict the λmax using our approach presented in our prior studies, we selected a spectrally-diverse set of 11 teleost Sws2 photopigments for which both amino acid sequence information and experimentally measured λmax values are known. The final first-order regression model, consisting of three terms associated with chromophore conformation, was sufficient to predict the λmax of Sws2 photopigments with high accuracy. This study further highlights the breadth of our approach in reliably predicting λmax values of Sws2 cone photopigments, evolutionary-more distant from template bovine RH1, and provided mechanistic insights into the role of known spectral tuning sites 
    more » « less
  2. Summary The timing of reproduction is a critical developmental decision in the life cycle of many plant species.Fine mapping of a rapid‐flowering mutant was done using whole‐genome sequence data from bulked DNA from a segregating F2 mapping populations. The causative mutation maps to a gene orthologous with the third subunit of DNA polymerase δ (POLD3), a previously uncharacterized gene in plants. Expression analyses of POLD3 were conducted via real time qPCR to determine when and in what tissues the gene is expressed.To better understand the molecular basis of the rapid‐flowering phenotype, transcriptomic analyses were conducted in the mutant vs wild‐type. Consistent with the rapid‐flowering mutant phenotype, a range of genes involved in floral induction and flower development are upregulated in the mutant.Our results provide the first characterization of the developmental and gene expression phenotypes that result from a lesion inPOLD3in plants. 
    more » « less
  3. Summary Isogenic individuals can display seemingly stochastic phenotypic differences, limiting the accuracy of genotype‐to‐phenotype predictions. The extent of this phenotypic variation depends in part on genetic background, raising questions about the genes involved in controlling stochastic phenotypic variation.Focusing on early seedling traits inArabidopsis thaliana, we found that hypomorphs of the cuticle‐related geneLIPID TRANSFER PROTEIN 2(LTP2) greatly increased variation in seedling phenotypes, including hypocotyl length, gravitropism and cuticle permeability. Manyltp2hypocotyls were significantly shorter than wild‐type hypocotyls while others resembled the wild‐type.Differences in epidermal properties and gene expression betweenltp2seedlings with long and short hypocotyls suggest a loss of cuticle integrity as the primary determinant of the observed phenotypic variation. We identified environmental conditions that reveal or mask the increased variation inltp2hypomorphs and found that increased expression of its closest paralogLTP1is necessary forltp2phenotypes.Our results illustrate how decreased expression of a single gene can generate starkly increased phenotypic variation in isogenic individuals in response to an environmental challenge. 
    more » « less
  4. Societal Impact Statement Micronutrient deficiency or “hidden hunger” is estimated to affect two billion people worldwide and increasing the micronutrient concentration of food could play an important role in tackling this global challenge. Using a combination of imaging techniques and atomic absorption spectroscopy, we describe a link between root phenotype and micronutrient concentration in cassava, which could enable new phenotypic selection strategies for breeding. This approach could be used with existing breeding infrastructure to enhance the micronutrient concentration of cassava and hence, benefit the health of people, particularly in low‐income countries where cassava is consumed as a staple crop. SummaryCassava storage roots are a staple food in low‐income countries of South‐East Asia and sub‐Saharan Africa, where growth stunting is prevalent as a consequence of micronutrient deficiencies. We aim to link phenotypes of field‐grown cassava roots to micronutrient concentration in the edible storage roots as a simple way to improve phenotypic selection for nutritional value in cassava.We used existing and newly developed imaging techniques to quantify root phenotypes of the cassava root architecture over time and used flame atomic absorption spectroscopy to measure micronutrient concentration in storage roots. Both together allow the association of root phenotypes with micronutrient concentration in mature cassava roots.We show that early and late bulking genotypes in cassava exhibit distinct foraging behaviors that are associated with micronutrient concentration in the edible storage root. Our observations suggest that late bulking cassava is a key to provide sufficient micronutrients in the edible storage root.The association between root phenotype and micronutrient concentration with imaging techniques allows phenotypic selection for enhanced micronutrient concentration. Therefore, implementing image‐based phenotyping into cassava breeding programs in sub‐Saharan Africa and South‐East Asia could be an essential element to resolve micronutrient deficiencies that puts individuals at a higher risk of growth stunting. 
    more » « less
  5. ABSTRACT The relationship between genotype and phenotype is non-trivial because of the often complex molecular pathways that make it difficult to unambiguously relate phenotypes to specific genotypes. Photopigments, comprising an opsin apoprotein bound to a light-absorbing chromophore, present an opportunity to directly relate the amino acid sequence to an absorbance peak phenotype (λmax). We examined this relationship by conducting a series of site-directed mutagenesis experiments of retinochrome, a non-visual opsin, from two closely related species: the common bay scallop, Argopecten irradians, and the king scallop, Pecten maximus. Using protein folding models, we identified three amino acid sites of likely functional importance and expressed mutated retinochrome proteins in vitro. Our results show that the mutation of amino acids lining the opsin binding pocket is responsible for fine spectral tuning, or small changes in the λmax of these light-sensitive proteins. Mutations resulted in a blue or red shift as predicted, but with dissimilar magnitudes. Shifts ranged from a 16 nm blue shift to a 12 nm red shift from the wild-type λmax. These mutations do not show an additive effect, but rather suggest the presence of epistatic interactions. This work highlights the importance of binding pocket shape in the evolution of spectral tuning and builds on our ability to relate genotypic changes to phenotypes in an emerging model for opsin functional analysis. 
    more » « less