skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Advances in the Xenopus immunome: Diversification, expansion, and contraction
Xenopus is a genus of African clawed frogs including two species, X. tropicalis and X. laevis that are extensively used in experimental biology, immunology, and biomedical studies. The availability of fully sequenced and annotated Xenopus genomes is strengthening genome-wide analyses of gene families and transgenesis to model human diseases. However, inaccuracies in genome annotation for genes involved in the immune system (i.e., immunome) hamper immunogenetic studies. Furthermore, advanced genome technologies (e.g., single-cell and RNA-Seq) rely on well-annotated genomes. The annotation problems of Xenopus immunome include a lack of established orthology across taxa, merged gene models, poor representation in gene pages on Xenbase, misannotated genes and missing gene IDs. The Xenopus Research Resource for Immunobiology in collaboration with Xenbase and a group of investigators are working to resolve these issues in the latest versions of genome browsers. In this review, we summarize the current problems of previously misannotated gene families that we have recently resolved. We also highlight the expansion, contraction, and diversification of previously misannotated gene families.  more » « less
Award ID(s):
1749427
PAR ID:
10522692
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Elseviere
Date Published:
Journal Name:
Developmental & Comparative Immunology
Volume:
145
Issue:
C
ISSN:
0145-305X
Page Range / eLocation ID:
104734
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.4% of human protein-coding genes to a chimpanzee genome assembly with 98.7% sequence identity. Availability The source code for Liftoff is available at https://github.com/agshumate/Liftoff 
    more » « less
  2. Valencia, Alfonso (Ed.)
    Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously annotated reference genome. Here, we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and implementation Liftoff can be installed via bioconda and PyPI. In addition, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Schmidt, Edward E (Ed.)

    The evolution of lineage-specific gene families remains poorly studied across the eukaryotic tree of life, with most analyses focusing on the recent evolution ofde novogenes in model species. Here we explore the origins of lineage-specific genes in ciliates, a ~1 billion year old clade of microeukaryotes that are defined by their division of somatic and germline functions into distinct nuclei. Previous analyses on conserved gene families have shown the effect of ciliates’ unusual genome architecture on gene family evolution: extensive genome processing–the generation of thousands of gene-sized somatic chromosomes from canonical germline chromosomes–is associated with larger and more diverse gene families. To further study the relationship between ciliate genome architecture and gene family evolution, we analyzed lineage specific gene families from a set of 46 transcriptomes and 12 genomes representing x species from eight ciliate classes. We assess how the evolution lineage-specific gene families occurs among four groups of ciliates: extensive fragmenters with gene-size somatic chromosomes, non-extensive fragmenters with “large’’ multi-gene somatic chromosomes, Heterotrichea with highly polyploid somatic genomes and Karyorelictea with ‘paradiploid’ somatic genomes. Our analyses demonstrate that: 1) most lineage-specific gene families are found at shallow taxonomic scales; 2) extensive genome processing (i.e., gene unscrambling) during development likely influences the size and number of young lineage-specific gene families; and 3) the influence of somatic genome architecture on molecular evolution is increasingly apparent in older gene families. Altogether, these data highlight the influences of genome architecture on the evolution of lineage-specific gene families in eukaryotes.

     
    more » « less
  4. Abstract

    Bacterial genomes exhibit widespread horizontal gene transfer, resulting in highly variable genome content that complicates the inference of genetic interactions. In this study, we develop a method for detecting coevolving genes from large datasets of bacterial genomes based on pairwise comparisons of closely related individuals, analogous to a pedigree study in eukaryotic populations. We apply our method to pairs of genes from theStaphylococcus aureusaccessory genome of over 75,000 annotated gene families using a database of over 40,000 whole genomes. We find many pairs of genes that appear to be gained or lost in a coordinated manner, as well as pairs where the gain of one gene is associated with the loss of the other. These pairs form networks of rapidly coevolving genes, primarily consisting of genes involved in virulence, mechanisms of horizontal gene transfer, and antibiotic resistance, particularly the SCCmeccomplex. While we focus on gene gain and loss, our method can also detect genes that tend to acquire substitutions in tandem, or genotype-phenotype or phenotype-phenotype coevolution. Finally, we present the R package that allows for the computation of our method.

     
    more » « less
  5. Abstract

    We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.

     
    more » « less