skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adapting Protein Language Models for Explainable Fine-Grained Evolutionary Pattern Discovery
Organisms that live in different environments face different evolutionary pressures. As such, organisms that have more successful phenotypes reproduce more frequently, but differing selective pressures acting at the organismal level can influence genes, and thus proteins. Understanding how proteins adapt across environments may therefore be useful in engineering proteins for specific environments as well as to improve our understanding of basic biology. In this work, we explicitly compare homologous (read: paired) proteins from different environments. While previous studies have explored the relevant evolutionary pressures in one of these environments [11], [17] and genomic responses to those pressures [1], [28], no prior computational study of their proteins has been performed. We apply ESM-2 [20] and although there is no signal in our negative control (two divergent yeast strains) as expected, we obtain near perfect prediction accuracy for our selected environmental gradient–the well-established subsurface vs. surface biome. We further show that ESM-2 is able to capture relevant fine-grained biological patterns in its embedding space, even in its smallest model. Significantly, we demonstrate that these embeddings can be interpreted using a novel visualization pipeline built using explainable AI techniques.  more » « less
Award ID(s):
2145434
PAR ID:
10492130
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
ISBN:
979-8-3503-3748-8
Page Range / eLocation ID:
2609 to 2616
Format(s):
Medium: X
Location:
Istanbul, Turkiye
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT The thermal environment that organisms experience can affect many aspects of their phenotype. As global temperatures become more unpredictable, it is imperative that we understand the molecular mechanisms by which organisms respond to variable, and often transient, thermal environments. Beyond deciphering the mechanisms through which organisms respond to temperature, we must also appreciate the underlying variation in temperature-dependent processes, as this variation is essential for understanding the potential to adapt to changing climates. In this Commentary, we use temperature-dependent sex determination as an example to explore the mechanistic processes underlying the development of temperature-sensitive phenotypes. We synthesize the current literature on how variable thermal conditions affect these processes and address factors that may limit or allow organisms to respond to variable environments. From these examples, we posit a framework for how the field might move forward in a more systematic way to address three key questions: (1) which genes directly respond to temperature-sensitive changes in protein function and which genes are downstream, indirect responders?; (2) how long does it take different proteins and genes to respond to temperature?; and (3) are the experimental temperature manipulations relevant to the climate the organism experiences or to predicted climate change scenarios? This approach combines mechanistic questions (questions 1 and 2) with ecologically relevant conditions (question 3), allowing us to explore how organisms respond to transient thermal environments and, thus, cope with climate change. 
    more » « less
  2. Soares, Cláudio (Ed.)
    Abstract Extremophile organisms are known that can metabolize at temperatures down to − 25 °C (psychrophiles) and up to 122 °C (hyperthermophiles). Understanding viability under extreme conditions is relevant for human health, biotechnological applications, and our search for life elsewhere in the universe. Information about the stability and dynamics of proteins under environmental extremes is an important factor in this regard. Here we compare the dynamics of small Fe-S proteins – rubredoxins – from psychrophilic and hyperthermophilic microorganisms, using three different nuclear techniques as well as molecular dynamics calculations to quantify motion at the Fe site. The theory of ‘corresponding states’ posits that homologous proteins from different extremophiles have comparable flexibilities at the optimum growth temperatures of their respective organisms. Although ‘corresponding states’ would predict greater flexibility for rubredoxins that operate at low temperatures, we find that from 4 to 300 K, the dynamics of the Fe sites in these homologous proteins are essentially equivalent. 
    more » « less
  3. Marine microbes form the base of ocean food webs and drive ocean biogeochemical cycling. Yet little is known about the ability of microbial populations to adapt as they are advected through changing conditions. Here, we investigated the interplay between physical and biological timescales using a model of adaptation and an eddy-resolving ocean circulation climate model. Two criteria were identified that relate the timing and nature of adaptation to the ratio of physical to biological timescales. Genetic adaptation was impeded in highly variable regimes by nongenetic modifications but was promoted in more stable environments. An evolutionary trade-off emerged where greater short-term nongenetic transgenerational effects (low-γ strategy) enabled rapid responses to environmental fluctuations but delayed genetic adaptation, while fewer short-term transgenerational effects (high-γ strategy) allowed faster genetic adaptation but inhibited short-term responses. Our results demonstrate that the selective pressures for organisms within a single water mass vary based on differences in generation timescales resulting in different evolutionary strategies being favored. Organisms that experience more variable environments should favor a low-γ strategy. Furthermore, faster cell division rates should be a key factor in genetic adaptation in a changing ocean. Understanding and quantifying the relationship between evolutionary and physical timescales is critical for robust predictions of future microbial dynamics. 
    more » « less
  4. null (Ed.)
    Angiotensin-converting enzyme 2 (ACE2) is the cell receptor that the coronavirus SARS-CoV-2 binds to and uses to enter and infect human cells. COVID-19, the pandemic disease caused by the coronavirus, involves diverse pathologies beyond those of a respiratory disease, including micro-thrombosis (micro-clotting), cytokine storms, and inflammatory responses affecting many organ systems. Longer-term chronic illness can persist for many months, often well after the pathogen is no longer detected. A better understanding of the proteins that ACE2 interacts with can reveal information relevant to these disease manifestations and possible avenues for treatment. We have undertaken an approach to predict candidate ACE2 interacting proteins which uses evolutionary inference to identify a set of mammalian proteins that “coevolve” with ACE2. The approach, called evolutionary rate correlation (ERC), detects proteins that show highly correlated evolutionary rates during mammalian evolution. Such proteins are candidates for biological interactions with the ACE2 receptor. The approach has uncovered a number of key ACE2 protein interactions of potential relevance to COVID-19 pathologies. Some proteins have previously been reported to be associated with severe COVID-19, but are not currently known to interact with ACE2, while additional predicted novel ACE2 interactors are of potential relevance to the disease. Using reciprocal rankings of protein ERCs, we have identified strongly interconnected ACE2 associated protein networks relevant to COVID-19 pathologies. ACE2 has clear connections to coagulation pathway proteins, such as Coagulation Factor V and fibrinogen components FGA, FGB, and FGG, the latter possibly mediated through ACE2 connections to Clusterin (which clears misfolded extracellular proteins) and GPR141 (whose functions are relatively unknown). ACE2 also connects to proteins involved in cytokine signaling and immune response ( e.g . XCR1, IFNAR2 and TLR8), and to Androgen Receptor (AR). The ERC prescreening approach has elucidated possible functions for relatively uncharacterized proteins and possible new functions for well-characterized ones. Suggestions are made for the validation of ERC-predicted ACE2 protein interactions. We propose that ACE2 has novel protein interactions that are disrupted during SARS-CoV-2 infection, contributing to the spectrum of COVID-19 pathologies. 
    more » « less
  5. null (Ed.)
    One of the most fundamental and unresolved questions in evolutionary biology is whether the outcomes of evolution are predictable. Is the diversity of life we see today the expected result of organisms adapting to their environment throughout history (also known as natural selection) or the product of random chance? Or did chance events early in history shape the paths that evolution could take next, determining the biological forms that emerged under natural selection much later? These questions are hard to study because evolution happened only once, long ago. To overcome this barrier, Xie, Pu, Metzger et al. developed an experimental approach that can evolve reconstructed ancestral proteins that existed deep in the past. Using this method, it is possible to replay evolution multiple times, from various historical starting points, under conditions similar to those that existed long ago. The end products of the evolutionary trajectories can then be compared to determine how predictable evolution actually is. Xie, Pu, Metzger et al. studied proteins belonging to the BCL-2 family, which originated some 800 million years ago. These proteins have diversified greatly over time in both their genetic sequences and their ability to bind to specific partner proteins called co-regulators. Xie, Pu, Metzger et al. synthesized BCL-2 proteins that existed at various times in the past. Each ancestral protein was then allowed to evolve repeatedly under natural selection to acquire the same co-regulator binding functions that evolved during history. At the end of each evolutionary trajectory, the genetic sequence of the resulting BCL-2 proteins was recorded. This revealed that the outcomes of evolution were almost completely unpredictable: trajectories initiated from the same ancestral protein produced proteins with very different sequences, and proteins launched from different ancestral starting points were even more dissimilar. Further experiments identified the mutations in each trajectory that caused changes in coregulator binding. When these mutations were introduced into other ancestral proteins, they did not yield the same change in function. This suggests that early chance events influenced each protein’s evolution in an unpredictable way by opening and closing the paths available to it in the future. This research expands our understanding of evolution on a molecular level whilst providing a new experimental approach for studying evolutionary drivers in more detail. The results suggest that BCL-2 proteins, in all their various forms, are unique products of a particular, unpredictable course of history set in motion by ancient chance events. 
    more » « less