skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds
Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.  more » « less
Award ID(s):
1655683
PAR ID:
10359424
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Diversity
Volume:
13
Issue:
11
ISSN:
1424-2818
Page Range / eLocation ID:
555
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Johnson, Patricia J (Ed.)
    ABSTRACT Analyses of codon usage in eukaryotes suggest that amino acid usage responds to GC pressure so AT-biased substitutions drive higher usage of amino acids with AT-ending codons. Here, we combine single-cell transcriptomics and phylogenomics to explore codon usage patterns in foraminifera, a diverse and ancient clade of predominantly uncultivable microeukaryotes. We curate data from 1,044 gene families in 49 individuals representing 28 genera, generating perhaps the largest existing dataset of data from a predominantly uncultivable clade of protists, to analyze compositional bias and codon usage. We find extreme variation in composition, with a median GC content at fourfold degenerate silent sites below 3% in some species and above 75% in others. The most AT-biased species are distributed among diverse non-monophyletic lineages. Surprisingly, despite the extreme variation in compositional bias, amino acid usage is highly conserved across all foraminifera. By analyzing nucleotide, codon, and amino acid composition within this diverse clade of amoeboid eukaryotes, we expand our knowledge of patterns of genome evolution across the eukaryotic tree of life.IMPORTANCEPatterns of molecular evolution in protein-coding genes reflect trade-offs between substitution biases and selection on both codon and amino acid usage. Most analyses of these factors in microbial eukaryotes focus on model species such asAcanthamoeba, Plasmodium,and yeast, where substitution bias is a primary contributor to patterns of amino acid usage. Foraminifera, an ancient clade of single-celled eukaryotes, present a conundrum, as we find highly conserved amino acid usage underlain by divergent nucleotide composition, including extreme AT-bias at silent sites among multiple non-sister lineages. We speculate that these paradoxical patterns are enabled by the dynamic genome structure of foraminifera, whose life cycles can include genome endoreplication and chromatin extrusion. 
    more » « less
  2. null (Ed.)
    Chloroviruses are large, plaque-forming, dsDNA viruses that infect chlorella-like green algae that live in a symbiotic relationship with protists. Chloroviruses have genomes from 290 to 370 kb, and they encode as many as 400 proteins. One interesting feature of chloroviruses is that they encode a potassium ion (K+) channel protein named Kcv. The Kcv protein encoded by SAG chlorovirus ATCV-1 is one of the smallest known functional K+ channel proteins consisting of 82 amino acids. The KcvATCV-1 protein has similarities to the family of two transmembrane domain K+ channel proteins; it consists of two transmembrane α-helixes with a pore region in the middle, making it an ideal model for studying K+ channels. To assess their genetic diversity, kcv genes were sequenced from 103 geographically distinct SAG chlorovirus isolates. Of the 103 kcv genes, there were 42 unique DNA sequences that translated into 26 new Kcv channels. The new predicted Kcv proteins differed from KcvATCV-1 by 1 to 55 amino acids. The most conserved region of the Kcv protein was the filter, the turret and the pore helix were fairly well conserved, and the outer and the inner transmembrane domains of the protein were the most variable. Two of the new predicted channels were shown to be functional K+ channels. 
    more » « less
  3. Abstract Amino‐acid protein composition plays an important role in biology, medicine, and nutrition. Here, a groundbreaking protein analysis technique that quickly estimates amino acid composition and secondary structure across various protein sizes, while maintaining their natural states is introduced and validated. This method combines multivariate statistics and the thermostable Raman interaction profiling (TRIP) technique, eliminating the need for complex preparations. In order to validate the approach, the Raman spectra are constructed of seven proteins of varying sizes by utilizing their amino acid frequencies and the Raman spectra of individual amino acids. These constructed spectra exhibit a close resemblance to the actual measured Raman spectra. Specific vibrational modes tied to free amino and carboxyl termini of the amino acids disappear as signals linked to secondary structures emerged under TRIP conditions. Furthermore, the technique is used inversely to successfully estimate amino acid compositions and secondary structures of unknown proteins across a range of sizes, achieving impressive accuracy ranging between 1.47% and 5.77% of root mean square errors (RMSE). These results extend the uses for TRIP beyond interaction profiling, to probe amino acid composition and structure. 
    more » « less
  4. Abstract Understanding the molecular evolution of the SARS‐CoV‐2 virus as it continues to spread in communities around the globe is important for mitigation and future pandemic preparedness. Three‐dimensional structures of SARS‐CoV‐2 proteins and those of other coronavirusess archived in the Protein Data Bank were used to analyze viral proteome evolution during the first 6 months of the COVID‐19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48 000 viral isolates revealed how each one of 29 viral proteins have undergone amino acid changes. Catalytic residues in active sites and binding residues in protein–protein interfaces showed modest, but significant, numbers of substitutions, highlighting the mutational robustness of the viral proteome. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi‐Gaussian distribution. Detailed results are presented for potential drug discovery targets and the four structural proteins that comprise the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and protein–protein and protein–nucleic acid interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure‐based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance. 
    more » « less
  5. Abstract Background In genus Rhinolophus , species in the Rhinolophus philippinensis and R. macrotis groups are unique because the horseshoe bats in these group have relatively low echolocation frequencies and flight speeds compared with other horseshoe bats with similar body size. The different characteristics among bat species suggest particular evolutionary processes may have occurred in this genus. To study the adaptive evidence in the mitochondrial genomes (mitogenomes) of rhinolophids, especially the mitogenomes of the species with low echolocation frequencies, we sequenced eight mitogenomes and used them for comparative studies of molecular phylogeny and adaptive evolution. Results Phylogenetic analysis using whole mitogenome sequences produced robust results and provided phylogenetic signals that were better than those obtained using single genes. The results supported the recent establishment of the separate macrotis group. The signals of adaptive evolution discovered in the Rhinolophus species were tested for some of the codons in two genes ( ND2 and ND6 ) that encode NADH dehydrogenases in oxidative phosphorylation system complex I. These genes have a background of widespread purifying selection. Signals of relaxed purifying selection and positive selection were found in ND2 and ND6 , respectively, based on codon models and physicochemical profiles of amino acid replacements. However, no pronounced overlap was found for non-synonymous sites in the mitogenomes of all the species with low echolocation frequencies. A signal of positive selection for ND5 was found in the branch-site model when R. philippinensis was set as the foreground branch. Conclusions The mitogenomes provided robust phylogenetic signals that were much more informative than the signals obtained using single mitochondrial genes. Two mitochondrial genes that encoding proteins in the oxidative phosphorylation system showed some evidence of adaptive evolution in genus Rhinolophus and the positive selection signals were tested for ND5 in R. philippinensis . These results indicate that mitochondrial protein-coding genes were targets of adaptive evolution during the evolution of Rhinolophus species, which might have contributed to a diverse range of acoustic adaptations in this genus. 
    more » « less