skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 30, 2026

Title: ERCnet: Phylogenomic Prediction of Interaction Networks in the Presence of Gene Duplication
Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein's position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analysis of evolutionary rate covariation (ERC) in protein sequence has been shown to be effective for large-scale prediction of functional relationships and interactions. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet, a bioinformatic program designed to overcome these challenges, facilitating efficient all-versus-all ERC analyses for large protein sequence datasets. We simulated proteome datasets and found that ERCnet achieves combined false positive and negative error rates well below 10% and that our novel “branch-by-branch” length measurements outperforms “root-to-tip” approaches in most cases, offering a valuable new strategy for performing ERC. We also compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet on empirical data, including its sensitivity to user-defined analysis parameters such as input dataset size and branch-length measurement strategy. We investigated the overlap between ERCnet runs with different species samples to understand how species number and composition affect predicted interactions and to identify the protein sets that consistently exhibit ERC across angiosperms. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet.  more » « less
Award ID(s):
2114641
PAR ID:
10612714
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Editor(s):
Hlouchova, Klara
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
42
Issue:
5
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Nuclear and plastid (chloroplast) genomes experience different mutation rates, levels of selection, and transmission modes, yet key cellular functions depend on their coordinated interactions. Functionally related proteins often show correlated changes in rates of sequence evolution across a phylogeny (evolutionary rate covariation or ERC), offering a means to detect previously unidentified suites of coevolving and cofunctional genes. We performed phylogenomic analyses across angiosperm diversity, scanning the nuclear genome for genes that exhibit ERC with plastid genes. As expected, the strongest hits were highly enriched for genes encoding plastid-targeted proteins, providing evidence that cytonuclear interactions affect rates of molecular evolution at genome-wide scales. Many identified nuclear genes functioned in post-transcriptional regulation and the maintenance of protein homeostasis (proteostasis), including protein translation (in both the plastid and cytosol), import, quality control and turnover. We also identified nuclear genes that exhibit strong signatures of coevolution with the plastid genome, but their encoded proteins lack organellar-targeting annotations, making them candidates for having previously undescribed roles in plastids. In sum, our genome-wide analyses reveal that plastid-nuclear coevolution extends beyond the intimate molecular interactions within chloroplast enzyme complexes and may be driven by frequent rewiring of the machinery responsible for maintenance of plastid proteostasis in angiosperms. 
    more » « less
  2. Abstract Background Dalbergia odorifera is an economically and culturally important species in the Fabaceae because of the high-quality lumber and traditional Chinese medicines made from this plant, however, overexploitation has increased the scarcity of D. odorifera . Given the rarity and the multiple uses of this species, it is important to expand the genomic resources for utilizing in applications such as tracking illegal logging, determining effective population size of wild stands, delineating pedigrees in marker assisted breeding programs, and resolving gene networks in functional genomics studies. Even the nuclear and chloroplast genomes have been published for D. odorifera , the complete mitochondrial genome has not been assembled or assessed for sequence transfer to other genomic compartments until now. Such work is essential in understanding structural and functional genome evolution in a lineage (Fabaceae) with frequent intergenomic sequence transfers. Results We integrated Illumina short-reads and PacBio CLR long-reads to assemble and annotate the complete mitochondrial genome of D. odorifera . The mitochondrial genome was organized as a single circular structure of 435 Kb in length containing 33 protein coding genes, 4 rRNA and 17 tRNA genes. Nearly 4.0% (17,386 bp) of the genome was annotated as repetitive DNA. From the sequence transfer analysis, it was found that 114 Kb of DNA originating from the mitochondrial genome has been transferred to the nuclear genome, with most of the transfer events having taken place relatively recently. The high frequency of sequence transfers from the mitochondria to the nuclear genome was similar to that of sequence transfer from the chloroplast to the nuclear genome. Conclusion For the first-time, the complete mitochondrial genome of D. odorifera was assembled in this study, which will provide a baseline resource in understanding genomic evolution in the highly specious Fabaceae. In particular, the assessment of intergenomic sequence transfer suggests that transfers have been common and recent indicating a possible role in environmental adaptation as has been found in other lineages. The high turnover rate of genomic colinearly and large differences in mitochondrial genome size found in the comparative analyses herein providing evidence for the rapid evolution of mitochondrial genome structure compared to chloroplasts in Faboideae. While phylogenetic analyses using functional genes indicate that mitochondrial genes are very slowly evolving compared to chloroplast genes. 
    more » « less
  3. Abstract Background The 16S mitochondrial rRNA gene is the most widely sequenced molecular marker in amphibian systematic studies, making it comparable to the universal CO1 barcode that is more commonly used in other animal groups. However, studies employ different primer combinations that target different lengths/regions of the 16S gene ranging from complete gene sequences (~ 1500 bp) to short fragments (~ 500 bp), the latter of which is the most ubiquitously used. Sequences of different lengths are often concatenated, compared, and/or jointly analyzed to infer phylogenetic relationships, estimate genetic divergence ( p -distances), and justify the recognition of new species (species delimitation), making the 16S gene region, by far, the most influential molecular marker in amphibian systematics. Despite their ubiquitous and multifarious use, no studies have ever been conducted to evaluate the congruence and performance among the different fragment lengths. Results Using empirical data derived from both Sanger-based and genomic approaches, we show that full-length 16S sequences recover the most accurate phylogenetic relationships, highest branch support, lowest variation in genetic distances (pairwise p -distances), and best-scoring species delimitation partitions. In contrast, widely used short fragments produce inaccurate phylogenetic reconstructions, lower and more variable branch support, erratic genetic distances, and low-scoring species delimitation partitions, the numbers of which are vastly overestimated. The relatively poor performance of short 16S fragments is likely due to insufficient phylogenetic information content. Conclusions Taken together, our results demonstrate that short 16S fragments are unable to match the efficacy achieved by full-length sequences in terms of topological accuracy, heuristic branch support, genetic divergences, and species delimitation partitions, and thus, phylogenetic and taxonomic inferences that are predicated on short 16S fragments should be interpreted with caution. However, short 16S fragments can still be useful for species identification, rapid assessments, or definitively coupling complex life stages in natural history studies and faunal inventories. While the full 16S sequence performs best, it requires the use of several primer pairs that increases cost, time, and effort. As a compromise, our results demonstrate that practitioners should utilize medium-length primers in favor of the short-fragment primers because they have the potential to markedly improve phylogenetic inference and species delimitation without additional cost. 
    more » « less
  4. Ware, Jessica (Ed.)
    Abstract Recent molecular analyses of transcriptome data from 94 species across 92 genera of North American Plecoptera identified the genus Kathroperla Banks, 1920 as sister group to Chloroperlidae + Perlodidae. Given that the genus Kathroperla has historically been included as a member of the family Chloroperlidae, this discovery indicated further investigation of the genus and the subfamily Paraperlinae was needed. Both transcriptome and genome sequencing datasets were generated from 32 species of the infraorder Systellognatha, including all described species of the Paraperlinae, to test the phylogenetic placement of these taxa. From these datasets, a large phylogenomic data matrix of 800 orthologous genes was produced, and multiple analyses were conducted, including both concatenated and coalescent analyses. Morphological comparisons were made among all Paraperlinae using light microscopy. All molecular results support a monophyletic Kathroperla, which is supported as sister taxon to the remaining Perloidea by five of six molecular analyses. Postocular head length is determined to be a distinct morphological character of this genus. Combined molecular and morphological evidence support the designation of Kathroperlidae, fam. n., as the seventeenth family of extant Plecoptera. 
    more » « less
  5. Abstract Many lizard species face extinction due to worldwide climate change. The Guatemalan Beaded Lizard, Heloderma charlesbogerti, is a member of the Family Helodermatidae that may be particularly imperiled; fewer than 600 mature individuals are believed to persist in the wild. In addition, H. charlesbogerti lizards are phenotypically remarkable. They are large in size, charismatically patterned, and possess a venomous bite. Here, we report the draft genome of the Guatemalan Beaded Lizard using DNA from a wild-caught individual. The assembled genome totals 2.31 Gb in length, similar in size to the genomes of related species. Single-copy orthologs were used to produce a novel molecular phylogeny, revealing that the Guatemalan Beaded Lizard falls into a clade with the Asian Glass Lizard (Anguidae) and in close association with the Komodo Dragon (Varanidae) and the Chinese Crocodile Lizard (Shinisauridae). In addition, we identified 31,411 protein-coding genes within the genome. Of the genes identified, we found 504 that evolved with a differential constraint on the branch leading to the Guatemalan Beaded Lizard. Lastly, we identified a decline in the effective population size of the Guatemalan Beaded Lizard approximately 400,000 years ago, followed by a stabilization before starting to dwindle again 60,000 years ago. The results presented here provide important information regarding a highly endangered, venomous reptile that can be used in future conservation, functional genetic, and phylogenetic analyses. 
    more » « less