skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Rokas, Antonis"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 1, 2026
  2. Synopsis A major goal of research in evolution and genetics is linking genotype to phenotype. This work could be direct, such as determining the genetic basis of a phenotype by leveraging genetic variation or divergence in a developmental, physiological, or behavioral trait. The work could also involve studying the evolutionary phenomena (e.g., reproductive isolation, adaptation, sexual dimorphism, behavior) that reveal an indirect link between genotype and a trait of interest. When the phenotype diverges across evolutionarily distinct lineages, this genotype-to-phenotype problem can be addressed using phylogenetic genotype-to-phenotype (PhyloG2P) mapping, which uses genetic signatures and convergent phenotypes on a phylogeny to infer the genetic bases of traits. The PhyloG2P approach has proven powerful in revealing key genetic changes associated with diverse traits, including the mammalian transition to marine environments and transitions between major mechanisms of photosynthesis. However, there are several intermediate traits layered in between genotype and the phenotype of interest, including but not limited to transcriptional profiles, chromatin states, protein abundances, structures, modifications, metabolites, and physiological parameters. Each intermediate trait is interesting and informative in its own right, but synthesis across data types has great promise for providing a deep, integrated, and predictive understanding of how genotypes drive phenotypic differences and convergence. We argue that an expanded PhyloG2P framework (the PhyloG2P matrix) that explicitly considers intermediate traits, and imputes those that are prohibitive to obtain, will allow a better mechanistic understanding of any trait of interest. This approach provides a proxy for functional validation and mechanistic understanding in organisms where laboratory manipulation is impractical. 
    more » « less
  3. Many remarkable phenotypes have repeatedly occurred across vast evolutionary distances. When convergent traits emerge on the tree of life, they are sometimes driven by the same underlying gene families, while other times, many different gene families are involved. Conversely, a gene family may be repeatedly recruited for a single trait or many different traits. To understand the general rules governing convergence at both genomic and phenotypic levels, we systematically tested associations between 56 binary metabolic traits and gene count in 14,785 gene families from 993 Saccharomycotina yeasts. Using a recently developed phylogenetic approach that reduces spurious correlations, we found that gene family expansion and contraction were significantly linked to trait gain and loss in 45/56 (80%) traits. While 595/739 (81%) significant gene families were associated with only one trait, we also identified several “keystone” gene families that were significantly associated with up to 13/56 (23%) of all traits. Strikingly, most of these families are known to encode metabolic enzymes and transporters, including all members of the industrially relevantMALtose fermentation loci in the baker’s yeastSaccharomyces cerevisiae. These results indicate that convergent evolution on the gene family level may be more widespread across deeper timescales than previously believed. 
    more » « less
    Free, publicly-accessible full text available June 10, 2026
  4. Abstract Functional innovation at the protein level is a key source of evolutionary novelties. The constraints on functional innovations are likely to be highly specific in different proteins, which are shaped by their unique histories and the extent of global epistasis that arises from their structures and biochemistries. These contextual nuances in the sequence–function relationship have implications both for a basic understanding of the evolutionary process and for engineering proteins with desirable properties. Here, we have investigated the molecular basis of novel function in a model member of an ancient, conserved, and biotechnologically relevant protein family. These Major Facilitator Superfamily sugar porters are a functionally diverse group of proteins that are thought to be highly plastic and evolvable. By dissecting a recent evolutionary innovation in an α-glucoside transporter from the yeast Saccharomyces eubayanus, we show that the ability to transport a novel substrate requires high-order interactions between many protein regions and numerous specific residues proximal to the transport channel. To reconcile the functional diversity of this family with the constrained evolution of this model protein, we generated new, state-of-the-art genome annotations for 332 Saccharomycotina yeast species spanning ∼400 My of evolution. By integrating phylogenetic and phenotypic analyses across these species, we show that the model yeast α-glucoside transporters likely evolved from a multifunctional ancestor and became subfunctionalized. The accumulation of additive and epistatic substitutions likely entrenched this subfunction, which made the simultaneous acquisition of multiple interacting substitutions the only reasonably accessible path to novelty. 
    more » « less
  5. Abstract Multiple sequence alignments and phylogenetic trees are rich in biological information and are fundamental to research in biology. PhyKIT is a tool for processing and analyzing the information content of multiple sequence alignments and phylogenetic trees. Here, we describe how to use PhyKIT for diverse analyses, including (i) constructing a phylogenomic supermatrix, (ii) detecting errors in orthology inference, (iii) quantifying biases in phylogenomic data sets, (iv) identifying radiation events or lack of resolution using gene support frequencies, and (v) conducting evolution‐based screens to facilitate gene function prediction. Several PhyKIT functions that streamline multiple sequence alignment and phylogenetic processing—such as renaming FASTA entries or tree tips—are also discussed. These protocols demonstrate how simple command‐line operations in the unified framework of PhyKIT facilitate diverse phylogenomic data analysis and processing, from supermatrix construction and diagnosis to gaining clues about gene function. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Installing PhyKIT and syntax for usage Basic Protocol 2: Constructing a phylogenomic supermatrix Basic Protocol 3: Detecting anomalies in orthology relationships Basic Protocol 4: Quantifying biases in phylogenomic data matrices and related measures Basic Protocol 5: Identifying polytomies Basic Protocol 6: Assessing gene‐gene coevolution as a genetic screen 
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  6. ABSTRACT Yeasts in the subphylum Saccharomycotina are found across the globe in disparate ecosystems. A major aim of yeast research is to understand the diversity and evolution of ecological traits, such as carbon metabolic breadth, insect association, and cactophily. This includes studying aspects of ecological traits like genetic architecture or association with other phenotypic traits. Genomic resources in the Saccharomycotina have grown rapidly. Ecological data, however, are still limited for many species, especially those only known from species descriptions where usually only a limited number of strains are studied. Moreover, ecological information is recorded in natural language format limiting high throughput computational analysis. To address these limitations, we developed an ontological framework for the analysis of yeast ecology. A total of 1,088 yeast strains were added to the Ontology of Yeast Environments (OYE) and analyzed in a machine‐learning framework to connect genotype to ecology. This framework is flexible and can be extended to additional isolates, species, or environmental sequencing data. Widespread adoption of OYE would greatly aid the study of macroecology in the Saccharomycotina subphylum. 
    more » « less
  7. Kamoun, Sophien (Ed.)
    Many distantly related organisms have convergently evolved traits and lifestyles that enable them to live in similar ecological environments. However, the extent of phenotypic convergence evolving through the same or distinct genetic trajectories remains an open question. Here, we leverage a comprehensive dataset of genomic and phenotypic data from 1,049 yeast species in the subphylum Saccharomycotina (Kingdom Fungi, Phylum Ascomycota) to explore signatures of convergent evolution in cactophilic yeasts, ecological specialists associated with cacti. We inferred that the ecological association of yeasts with cacti arose independently approximately 17 times. Using a machine learning–based approach, we further found that cactophily can be predicted with 76% accuracy from both functional genomic and phenotypic data. The most informative feature for predicting cactophily was thermotolerance, which we found to be likely associated with altered evolutionary rates of genes impacting the cell envelope in several cactophilic lineages. We also identified horizontal gene transfer and duplication events of plant cell wall–degrading enzymes in distantly related cactophilic clades, suggesting that putatively adaptive traits evolved independently through disparate molecular mechanisms. Notably, we found that multiple cactophilic species and their close relatives have been reported as emerging human opportunistic pathogens, suggesting that the cactophilic lifestyle—and perhaps more generally lifestyles favoring thermotolerance—might preadapt yeasts to cause human disease. This work underscores the potential of a multifaceted approach involving high-throughput genomic and phenotypic data to shed light onto ecological adaptation and highlights how convergent evolution to wild environments could facilitate the transition to human pathogenicity. 
    more » « less
  8. Abstract Gene gains and losses are a major driver of genome evolution; their precise characterization can provide insights into the origin and diversification of major lineages. Here, we examined gene family evolution of 1154 genomes from nearly all known species in the medically and technologically important yeast subphylum Saccharomycotina. We found that yeast gene family evolution differs from that of plants, animals, and filamentous ascomycetes, and is characterized by smaller overall gene numbers yet larger gene family sizes for a given gene number. Faster-evolving lineages (FELs) in yeasts experienced significantly higher rates of gene losses—commensurate with a narrowing of metabolic niche breadth—but higher speciation rates than their slower-evolving sister lineages (SELs). Gene families most often lost are those involved in mRNA splicing, carbohydrate metabolism, and cell division and are likely associated with intron loss, metabolic breadth, and non-canonical cell cycle processes. Our results highlight the significant role of gene family contractions in the evolution of yeast metabolism, genome function, and speciation, and suggest that gene family evolutionary trajectories have differed markedly across major eukaryotic lineages. 
    more » « less
  9. Genome-scale amounts of data and the development of novel statistical phylogenetic 18 approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved 19 many of its branches. However, incongruence—the inference of conflicting evolutionary histories—20 remains pervasive in phylogenomic data. We synthesize the biological and analytical factors that 21 drive incongruence, discuss methodological advances to diagnose and handle incongruence, and 22 identify avenues for future research. The study of incongruence has enabled a deeper understanding 23 of phylogenesis and improved our ability to reconstruct and interpret the tree of life. 
    more » « less