Abstract Phased genomes and pangenomes are enhancing our understanding of genetic variation. Accurate phasing and assembly in repetitive regions of the genome remain challenging, however. Addressing this obstacle is crucial for studying structural genomic variation, such as copy number variations (CNVs) common to repetitive regions. Polar fishes, for example, evolved repetitive tandem arrays of antifreeze protein (AFP) genes that facilitate adaptation to freezing and expanded in copy number in colder environments. AFP CNVs remain poorly characterized in polar fishes and may be illuminated by haplotype-aware approaches. We performed long-read sequencing for two polar fishes in the suborder Zoarcoidei and leveraged additional published long-read data to assemble phased genomes. We developed a workflow to measure haplotype diversity in CNV while controlling for misassembly and switch errors—a change from one parental haplotype to another in a contiguous assembly. We presentgfa_parser, which computes and extracts all possible contiguous sequences for phased or primary assemblies from graphical fragment assembly (GFA) files, andswitch_error_screen, which flags potential switch errors.gfa_parserrevealed that assembly uncertainty was ubiquitous across AFP array haplotypes and that standard processing of graphical fragment assemblies can bias measurement of haplotype CNVs. We detected no switch errors in AFP arrays. After controlling for misassembly and switch error, we detected haplotype diversity of AFP CNVs in all studied polar Zoarcoidei species and in 60% of AFP arrays. Intraindividual haplotype diversity spanned differences of 3–16 copies. Our workflow revealed intraspecific genomic diversity in zoarcoids that likely fueled the evolution of AFP copy number across temperature.
more »
« less
Pathways to polar adaptation in fishes revealed by long‐read sequencing
Abstract Long‐read sequencing is driving a new reality for genome science in which highly contiguous assemblies can be produced efficiently with modest resources. Genome assemblies from long‐read sequences are particularly exciting for understanding the evolution of complex genomic regions that are often difficult to assemble. In this study, we utilized long‐read sequencing data to generate a high‐quality genome assembly for an Antarctic eelpout,Ophthalmolycus amberensis, the first for the globally distributed family Zoarcidae. We used this assembly to understand howO. amberensishas adapted to the harsh Southern Ocean and compared it to another group of Antarctic fishes: the notothenioids. We showed that selection has largely acted on different targets in eelpouts relative to notothenioids. However, we did find some overlap; in both groups, genes involved in membrane structure, thermal tolerance and vision have evidence of positive selection. We found evidence for historical shifts of transposable element activity inO. amberensisand other polar fishes, perhaps reflecting a response to environmental change. We were specifically interested in the evolution of two complex genomic loci known to underlie key adaptations to polar seas: haemoglobin and antifreeze proteins (AFPs). We observed unique evolution of the haemoglobin MN cluster in eelpouts and related fishes in the suborder Zoarcoidei relative to other Perciformes. For AFPs, we identified the first species in the suborder with no evidence ofafpIIIsequences (Cebidichthys violaceus) in the genomic region where they are found in all other Zoarcoidei, potentially reflecting a lineage‐specific loss of this cluster. Beyond polar fishes, our results highlight the power of long‐read sequencing to understand genome evolution.
more »
« less
- PAR ID:
- 10401176
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Molecular Ecology
- Volume:
- 32
- Issue:
- 6
- ISSN:
- 0962-1083
- Page Range / eLocation ID:
- p. 1381-1397
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Hodgins, Kathryn (Ed.)Abstract Antifreeze proteins (AFPs) have enabled teleost fishes to repeatedly colonize polar seas. Four AFP types have convergently evolved in several fish lineages. AFPs inhibit ice crystal growth and lower tissue freezing point. In lineages with AFPs, species inhabiting colder environments may possess more AFP copies. Elucidating how differences in AFP copy number evolve is challenging due to the genes’ tandem array structure and consequently poor resolution of these repetitive regions. Here, we explore the evolution of type III AFPs (AFP III) in the globally distributed suborder Zoarcoidei, leveraging six new long-read genome assemblies. Zoarcoidei has fewer genomic resources relative to other polar fish clades while it is one of the few groups of fishes adapted to both the Arctic and Southern Oceans. Combining these new assemblies with additional long-read genomes available for Zoarcoidei, we conducted a comprehensive phylogenetic test of AFP III evolution and modeled the effects of thermal habitat and depth on AFP III gene family evolution. We confirm a single origin of AFP III via neofunctionalization of the enzyme sialic acid synthase B. We also show that AFP copy number increased under low temperature but decreased with depth, potentially because pressure lowers freezing point. Associations between the environment and AFP III copy number were driven by duplications of paralogs that were translocated out of the ancestral locus at which AFP III arose. Our results reveal novel environmental effects on AFP evolution and demonstrate the value of high-quality genomic resources for studying how structural genomic variation shapes convergent adaptation.more » « less
-
Abstract Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.more » « less
-
The basal South American notothenioid Eleginops maclovinus (Patagonia blennie or róbalo) occupies a uniquely important phylogenetic position in Notothenioidei as the singular closest sister species to the Antarctic cryonotothenioid fishes. Its genome and the traits encoded therein would be the nearest representatives of the temperate ancestor from which the Antarctic clade arose, providing an ancestral reference for deducing polar derived changes. In this study, we generated a gene- and chromosome-complete assembly of the E. maclovinus genome using long read sequencing and HiC scaffolding. We compared its genome architecture with the more basally divergent Cottoperca gobio and the derived genomes of nine cryonotothenioids representing all five Antarctic families. We also reconstructed a notothenioid phylogeny using 2918 proteins of single-copy orthologous genes from these genomes that reaffirmed E. maclovinus’ phylogenetic position. We additionally curated E. maclovinus’ repertoire of circadian rhythm genes, ascertained their functionality by transcriptome sequencing, and compared its pattern of gene retention with C. gobio and the derived cryonotothenioids. Through reconstructing circadian gene trees, we also assessed the potential role of the retained genes in cryonotothenioids by referencing to the functions of the human orthologs. Our results found E. maclovinus to share greater conservation with the Antarctic clade, solidifying its evolutionary status as the direct sister and best suited ancestral proxy of cryonotothenioids. The high-quality genome of E. maclovinus will facilitate inquiries into cold derived traits in temperate to polar evolution, and conversely on the paths of readaptation to non-freezing habitats in various secondarily temperate cryonotothenioids through comparative genomic analyses.more » « less
-
Abstract Mitochondrial genomes are known for their compact size and conserved gene order, however, recent studies employing long-read sequencing technologies have revealed the presence of atypical mitogenomes in some species. In this study, we assembled and annotated the mitogenomes of five Antarctic notothenioids, including four icefishes (Champsocephalus gunnari,C. esox,Chaenocephalus aceratus, andPseudochaenichthys georgianus) and the cold-specializedTrematomus borchgrevinki. Antarctic notothenioids are known to harbor some rearrangements in their mt genomes, however the extensive duplications in icefishes observed in our study have never been reported before. In the icefishes, we observed duplications of the protein coding geneND6, two transfer RNAs,and the control region with different copy number variants present within the same individuals and with someND6duplications appearing to follow the canonical Duplication-Degeneration-Complementation (DDC) model inC. esoxandC. gunnari. In addition, using long-read sequencing and k-mer analysis, we were able to detect extensive heteroplasmy inC. aceratusandC. esox. We also observed a large inversion in the mitogenome ofT. borchgrevinki, along with the presence of tandem repeats in its control region. This study is the first in using long-read sequencing to assemble and identify structural variants and heteroplasmy in notothenioid mitogenomes and signifies the importance of long-reads in resolving complex mitochondrial architectures. Identification of such wide-ranging structural variants in the mitogenomes of these fishes could provide insight into the genetic basis of the atypical icefish mitochondrial physiology and more generally may provide insights about their potential role in cold adaptation.more » « less
An official website of the United States government
