skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, April 12 until 2:00 AM ET on Saturday, April 13 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 1645087

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Mitochondrial genomes are known for their compact size and conserved gene order, however, recent studies employing long-read sequencing technologies have revealed the presence of atypical mitogenomes in some species. In this study, we assembled and annotated the mitogenomes of five Antarctic notothenioids, including four icefishes (Champsocephalus gunnari,C. esox,Chaenocephalus aceratus, andPseudochaenichthys georgianus) and the cold-specializedTrematomus borchgrevinki. Antarctic notothenioids are known to harbor some rearrangements in their mt genomes, however the extensive duplications in icefishes observed in our study have never been reported before. In the icefishes, we observed duplications of the protein coding geneND6, two transfer RNAs,and the control region with different copy number variants present within the same individuals and with someND6duplications appearing to follow the canonical Duplication-Degeneration-Complementation (DDC) model inC. esoxandC. gunnari. In addition, using long-read sequencing and k-mer analysis, we were able to detect extensive heteroplasmy inC. aceratusandC. esox. We also observed a large inversion in the mitogenome ofT. borchgrevinki, along with the presence of tandem repeats in its control region. This study is the first in using long-read sequencing to assemble and identify structural variants and heteroplasmy in notothenioid mitogenomes and signifies the importance of long-reads in resolving complex mitochondrial architectures. Identification of such wide-ranging structural variants in the mitogenomes of these fishes could provide insight into the genetic basis of the atypical icefish mitochondrial physiology and more generally may provide insights about their potential role in cold adaptation.

     
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Abstract

    Library preparation protocols for most sequencing technologies involve PCR amplification of the template DNA, which open the possibility that a given template DNA molecule is sequenced multiple times. Reads arising from this phenomenon, known as PCR duplicates, inflate the cost of sequencing and can jeopardize the reliability of affected experiments. Despite the pervasiveness of this artefact, our understanding of its causes and of its impact on downstream statistical analyses remains essentially empirical. Here, we develop a general quantitative model of amplification distortions in sequencing data sets, which we leverage to investigate the factors controlling the occurrence of PCR duplicates. We show that the PCR duplicate rate is determined primarily by the ratio between library complexity and sequencing depth, and that amplification noise (including in its dependence on the number of PCR cycles) only plays a secondary role for this artefact. We confirm our predictions using new and published RAD‐seq libraries and provide a method to estimate library complexity and amplification noise in any data set containing PCR duplicates. We discuss how amplification‐related artefacts impact downstream analyses, and in particular genotyping accuracy. The proposed framework unites the numerous observations made on PCR duplicates and will be useful to experimenters of all sequencing technologies where DNA availability is a concern.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  3. Kelley, Joanna (Ed.)
    Abstract

    White-blooded Antarctic icefishes, a family within the adaptive radiation of Antarctic notothenioid fishes, are an example of extreme biological specialization to both the chronic cold of the Southern Ocean and life without hemoglobin. As a result, icefishes display derived physiology that limits them to the cold and highly oxygenated Antarctic waters. Against these constraints, remarkably one species, the pike icefish Champsocephalus esox, successfully colonized temperate South American waters. To study the genetic mechanisms underlying secondarily temperate adaptation in icefishes, we generated chromosome-level genome assemblies of both C. esox and its Antarctic sister species, Champsocephalus gunnari. The C. esox genome is similar in structure and organization to that of its Antarctic congener; however, we observe evidence of chromosomal rearrangements coinciding with regions of elevated genetic divergence in pike icefish populations. We also find several key biological pathways under selection, including genes related to mitochondria and vision, highlighting candidates behind temperate adaptation in C. esox. Substantial antifreeze glycoprotein (AFGP) pseudogenization has occurred in the pike icefish, likely due to relaxed selection following ancestral escape from Antarctica. The canonical AFGP locus organization is conserved in C. esox and C. gunnari, but both show a translocation of two AFGP copies to a separate locus, previously unobserved in cryonotothenioids. Altogether, the study of this secondarily temperate species provides an insight into the mechanisms underlying adaptation to ecologically disparate environments in this otherwise highly specialized group.

     
    more » « less
  4. Abstract

    Restriction‐site associated DNA sequencing (RADseq) has become a powerful and versatile tool in modern population genomics, enabling large‐scale evolutionary and genomic analyses in otherwise inaccessible biological systems. With its widespread use, different variants on the protocol have been developed to suit specific experimental needs. Researchers face the challenge of choosing the optimal molecular and sequencing protocols for their reduced representation experimental design, an often‐complicated process. Strategic errors can lead to biased data generation that has reduced power to answer biological questions. Here, we present RADinitio, simulation software for the selection and optimization of RADseq experiments via the generation of sequencing data that behave similarly to empirical sources. RADinitio provides an evolutionary simulation of populations, implementation of various RADseq protocols with customizable parameters, and thorough assessment of missing data. We test the efficacy of the software using different RAD protocols across several organisms, highlighting the importance of protocol selection on the magnitude and quality of data acquired. Additionally, we test the effects of RAD library preparation and sequencing on allelic dropout, observing that library preparation and sequencing often contributes more to missing alleles than population‐level variation.

     
    more » « less
  5. Abstract

    For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively‐parallel, short‐read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here, we describe the first software natively capable of using paired‐end sequencing to derive short contigs from de novo RAD data. Stacks version 2 employs a de Bruijn graph assembler to build and connect contigs from forward and reverse reads for each de novo RAD locus, which it then uses as a reference for read alignments. The new architecture allows all the individuals in a metapopulation to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes, generating RAD loci that are 400–800 bp in length. To prove its recall and precision, we tested the software with simulated data and compared reference‐aligned and de novo analyses of three empirical data sets. Our study shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired‐end de novo data sets.

     
    more » « less
  6. The basal South American notothenioid Eleginops maclovinus (Patagonia blennie or róbalo) occupies a uniquely important phylogenetic position in Notothenioidei as the singular closest sister species to the Antarctic cryonotothenioid fishes. Its genome and the traits encoded therein would be the nearest representatives of the temperate ancestor from which the Antarctic clade arose, providing an ancestral reference for deducing polar derived changes. In this study, we generated a gene- and chromosome-complete assembly of the E. maclovinus genome using long read sequencing and HiC scaffolding. We compared its genome architecture with the more basally divergent Cottoperca gobio and the derived genomes of nine cryonotothenioids representing all five Antarctic families. We also reconstructed a notothenioid phylogeny using 2918 proteins of single-copy orthologous genes from these genomes that reaffirmed E. maclovinus’ phylogenetic position. We additionally curated E. maclovinus’ repertoire of circadian rhythm genes, ascertained their functionality by transcriptome sequencing, and compared its pattern of gene retention with C. gobio and the derived cryonotothenioids. Through reconstructing circadian gene trees, we also assessed the potential role of the retained genes in cryonotothenioids by referencing to the functions of the human orthologs. Our results found E. maclovinus to share greater conservation with the Antarctic clade, solidifying its evolutionary status as the direct sister and best suited ancestral proxy of cryonotothenioids. The high-quality genome of E. maclovinus will facilitate inquiries into cold derived traits in temperate to polar evolution, and conversely on the paths of readaptation to non-freezing habitats in various secondarily temperate cryonotothenioids through comparative genomic analyses.

     
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  7. Abstract For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least three phases: i) short-read only, ii) short- and long-read hybrid, and iii) long-read only assemblies. Each of the phases has their own error model. We hypothesized that hidden scaffolding errors in short-read assembly and erroneous long-read contigs degrades the quality of short- and long-read hybrid assemblies. We assembled the genome of T. borchgrevinki from data generated during each of the three phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer based strategy improved short-read assemblies as measured by BUSCO while mate-pair libraries introduced hidden scaffolding errors and perturbed BUSCO scores. Further, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read only assemblies can be optimized for contiguity by sub-sampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality. 
    more » « less