skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 9, 2026

Title: Lost in translation: conserved amino acid usage despite extreme codon bias in foraminifera
ABSTRACT Analyses of codon usage in eukaryotes suggest that amino acid usage responds to GC pressure so AT-biased substitutions drive higher usage of amino acids with AT-ending codons. Here, we combine single-cell transcriptomics and phylogenomics to explore codon usage patterns in foraminifera, a diverse and ancient clade of predominantly uncultivable microeukaryotes. We curate data from 1,044 gene families in 49 individuals representing 28 genera, generating perhaps the largest existing dataset of data from a predominantly uncultivable clade of protists, to analyze compositional bias and codon usage. We find extreme variation in composition, with a median GC content at fourfold degenerate silent sites below 3% in some species and above 75% in others. The most AT-biased species are distributed among diverse non-monophyletic lineages. Surprisingly, despite the extreme variation in compositional bias, amino acid usage is highly conserved across all foraminifera. By analyzing nucleotide, codon, and amino acid composition within this diverse clade of amoeboid eukaryotes, we expand our knowledge of patterns of genome evolution across the eukaryotic tree of life.IMPORTANCEPatterns of molecular evolution in protein-coding genes reflect trade-offs between substitution biases and selection on both codon and amino acid usage. Most analyses of these factors in microbial eukaryotes focus on model species such asAcanthamoeba, Plasmodium,and yeast, where substitution bias is a primary contributor to patterns of amino acid usage. Foraminifera, an ancient clade of single-celled eukaryotes, present a conundrum, as we find highly conserved amino acid usage underlain by divergent nucleotide composition, including extreme AT-bias at silent sites among multiple non-sister lineages. We speculate that these paradoxical patterns are enabled by the dynamic genome structure of foraminifera, whose life cycles can include genome endoreplication and chromatin extrusion.  more » « less
Award ID(s):
2230391 1924570
PAR ID:
10589729
Author(s) / Creator(s):
; ; ;
Editor(s):
Johnson, Patricia J
Publisher / Repository:
American Society of Microbiology
Date Published:
Journal Name:
mBio
Volume:
16
Issue:
4
ISSN:
2150-7511
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Codon usage bias, or the unequal use of synonymous codons, is observed across genes, genomes, and between species. It has been implicated in many cellular functions, such as translation dynamics and transcript stability, but can also be shaped by neutral forces. We characterized codon usage across 1,154 strains from 1,051 species from the fungal subphylum Saccharomycotina to gain insight into the biases, molecular mechanisms, evolution, and genomic features contributing to codon usage patterns. We found a general preference for A/T-ending codons and correlations between codon usage bias, GC content, and tRNA-ome size. Codon usage bias is distinct between the 12 orders to such a degree that yeasts can be classified with an accuracy >90% using a machine learning algorithm. We also characterized the degree to which codon usage bias is impacted by translational selection. We found it was influenced by a combination of features, including the number of coding sequences, BUSCO count, and genome length. Our analysis also revealed an extreme bias in codon usage in the Saccharomycodales associated with a lack of predicted arginine tRNAs that decode CGN codons, leaving only the AGN codons to encode arginine. Analysis of Saccharomycodales gene expression, tRNA sequences, and codon evolution suggests that avoidance of the CGN codons is associated with a decline in arginine tRNA function. Consistent with previous findings, codon usage bias within the Saccharomycotina is shaped by genomic features and GC bias. However, we find cases of extreme codon usage preference and avoidance along yeast lineages, suggesting additional forces may be shaping the evolution of specific codons. 
    more » « less
  2. Fay, Justin C. (Ed.)
    Patterns of non-uniform usage of synonymous codons vary across genes in an organism and between species across all domains of life. This codon usage bias (CUB) is due to a combination of non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most models quantify the effects of mutation bias and selection on CUB assuming uniform mutational and other non-adaptive forces across the genome. However, non-adaptive nucleotide biases can vary within a genome due to processes such as biased gene conversion (BGC), potentially obfuscating signals of selection on codon usage. Moreover, genome-wide estimates of non-adaptive nucleotide biases are lacking for non-model organisms. We combine an unsupervised learning method with a population genetics model of synonymous coding sequence evolution to assess the impact of intragenomic variation in non-adaptive nucleotide bias on quantification of natural selection on synonymous codon usage across 49 Saccharomycotina yeasts. We find that in the absence of a priori information, unsupervised learning can be used to identify genes evolving under different non-adaptive nucleotide biases. We find that the impact of intragenomic variation in non-adaptive nucleotide bias varies widely, even among closely-related species. We show that the overall strength and direction of translational selection can be underestimated by failing to account for intragenomic variation in non-adaptive nucleotide biases. Interestingly, genes falling into clusters identified by machine learning are also physically clustered across chromosomes. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable non-adaptive nucleotide biases on codon frequencies. 
    more » « less
  3. The evolution and diversity of the supergroup Amoebozoa is complex and poorly understood. The supergroup encompasses predominantly amoeboid lineages characterized by extreme diversity in phenotype, behavior and genetics. The study of natural selection, a driving force of diversification, within and among species of Amoebozoa will play a crucial role in understanding the evolution of the supergroup. In this study, we searched for traces of natural selection based on a set of highly conserved protein-coding genes in a phylogenetic framework from a broad sampling of amoebozoans. Using these genes, we estimated substitution rates and inferred patterns of selective pressure in lineages and sites with various models. We also examined the effect of selective pressure on codon usage bias and potential correlations with observed biological traits and habitat. Results showed large heterogeneity of selection across lineages of Amoebozoa, indicating potential species-specific optimization of adaptation to their diverse ecological environment. Overall, lineages in Tubulinea had undergone stronger purifying selection with higher average substitution rates compared to Discosea and Evosea. Evidence of adaptive evolution was observed in some representative lineages and in a gene (Rpl7a) within Evosea, suggesting potential innovation and beneficial mutations in these lineages. Our results revealed that members of the fast-evolving lineages, Entamoeba and Cutosea, all underwent strong purifying selection but had distinct patterns of codon usage bias. For the first time, this study revealed an overall pattern of natural selection across the phylogeny of Amoebozoa and provided significant implications on their distinctive evolutionary processes. 
    more » « less
  4. Though historically understudied, due in large part to most species being uncultivable, microbial eukaryotes (i.e. protists) are abundant and widespread across diverse habitats. Recent advances in molecular techniques, including metabarcoding, allow for the characterization of poorly known protist lineages. This study surveys the diversity of SAR (Stramenopila, Alveolata, and Rhizaria), a major eukaryotic clade that is estimated to represent about half of all eukaryotic diversity. SAR lineages use varied metabolic strategies like mixotrophy in dinoflagellates (Alveolata), parasitism in apicomplexans (Alveolata) and labyrinthulids (Stramenopila), and life cycle stages that include encystment and attachment (e.g. in ciliates, Alveolata) to survive in highly dynamic habitats. Using metabarcoding primers designed specifically to target a portion of the 18S small subunit ribosomal RNA (SSU-rRNA) gene of SAR lineages, we compare protist community composition from tide pools in Acadia National Park, Maine, USA. We characterize over 500 lineages, here operational taxonomic units (OTUs), many of which are found abundant in the tide pool environment. We also find that communities vary by month sampled and exhibit patterns by size (i.e. macro-, micro-, and nano-sized). Taken together, these data allow us to further catalog protist diversity in extreme environments (e.g. those subject to extreme fluctuations in temperature and salinity during tidal cycles). Such data are critical in the explorations of biodiversity patterns among microorganisms on our rapidly changing planet. 
    more » « less
  5. Nearly neutral theory predicts that species with higher effective population size (N_e) are better at purging slightly deleterious mutations. We compare evolution in high N_e vs. low-N_e vertebrates to reveal subtle selective preferences among amino acids. We take three complementary approaches. First, we fit non-stationary substitution models using maximum likelihood, comparing the high-N_e clade of rodents and lagomorphs to its low-N_e sister clade of primates and colugos. Second, we compared evolutionary outcomes across a wider range of vertebrates, via correlations between amino acid frequencies and N_e. Third, we dissected which amino acids substitutions occurred in human, chimpanzee, mouse, and rat, as scored by parsimony – this also enabled comparison to a historical paper. All methods agree on amino acid preference under more effective selection. Preferred amino acids are less costly to synthesize and use GC-rich codons, which are hard to maintain under AT-biased mutation. These factors explain 85% of the variance in amino acid preferences. Parsimony-induced bias in the historical study produces an apparent reduction in structural disorder, perhaps driven by slightly deleterious substitutions in rapidly evolving regions. Within highly exchangeable pairs of amino acids, arginine is strongly preferred over lysine, aspartate over glutamate, and valine over isoleucine, consistent with more effective selection preferring a marginally larger free energy of folding. Two of these preferences (K→R and I→V), but not a third (E→D) match differences between thermophiles and mesophilic relatives. These results reveal the biophysical consequences of mutation-selection-drift balance, and demonstrate the utility of nearly neutral theory for understanding protein evolution. 
    more » « less