skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: angsd ‐wrapper: utilities for analysing next‐generation sequencing data
Abstract High‐throughput sequencing has changed many aspects of population genetics, molecular ecology and related fields, affecting both experimental design and data analysis. The software packageangsdallows users to perform a number of population genetic analyses on high‐throughput sequencing data.angsduses probabilistic approaches which can directly make use of genotype likelihoods; thus,SNPcalling is not required for comparative analyses. This takes advantage of all the sequencing data and produces more accurate results for samples with low sequencing depth. Here, we presentangsd‐wrapper, a set of wrapper scripts that provides a user‐friendly interface for runningangsdand visualizing results.angsd‐wrapper supports multiple types of analyses including estimates of nucleotide sequence diversity neutrality tests, principal component analysis, estimation of admixture proportions for individual samples and calculation of statistics that quantify recent introgression.angsd‐wrapper also provides interactive graphing ofangsdresults to enhance data exploration. We demonstrate the usefulness ofangsd‐wrapper by analysing resequencing data from populations of wild and domesticatedZea.angsd‐wrapper is freely available fromhttps://github.com/mojaveazure/angsd-wrapper.  more » « less
Award ID(s):
1339393
PAR ID:
10201379
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
16
Issue:
6
ISSN:
1755-098X
Format(s):
Medium: X Size: p. 1449-1454
Size(s):
p. 1449-1454
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundLow-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce to analyze population structure of low-depth sequencing data. ResultsThe method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. ConclusionsWe apply to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The package is available onhttps://github.com/yiwenstat/MCPCA_PopGen. 
    more » « less
  2. Abstract The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci. 
    more » « less
  3. BackgroundThe advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. MethodsOne recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. ResultsiMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. ConclusionThe incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (https://github.com/EESI/Incremental-Protein-Search; DOI:10.5281/zenodo.14675319). 
    more » « less
  4. Summary This work revisits a publication by Beanet al.(2018) that reports seven amino acid substitutions are essential for the evolution ofl‐DOPA 4,5‐dioxygenase (DODA) activity in Caryophyllales. In this study, we explore several concerns which led us to replicate the analyses of Beanet al.(2018).Our comparative analyses, with structural modelling, implicate numerous residues additional to those identified by Beanet al.(2018), with many of these additional residues occurring around the active site of BvDODAα1. We therefore replicated the analyses of Beanet al.(2018) to re‐observe the effect of their original seven residue substitutions in a BvDODAα2 background, that is the BvDODAα2‐mut3 variant.Multiplein vivoassays, in bothSaccharomyces cerevisiaeandNicotiana benthamiana, did not result in visible DODA activity in BvDODAα2‐mut3, with betalain production always 10‐fold below BvDODAα1.In vitroassays also revealed substantial differences in both catalytic activity and pH optima between BvDODAα1, BvDODAα2 and BvDODAα2‐mut3 proteins, explaining their differing performancein vivo.In summary, we were unable to replicate thein vivoanalyses of Beanet al.(2018), and our quantitativein vivoandin vitroanalyses suggest a minimal effect of these seven residues in altering catalytic activity of BvDODAα2. We conclude that the evolutionary pathway to high DODA activity is substantially more complex than implied by Beanet al.(2018). 
    more » « less
  5. A<sc>bstract</sc> We presentνDoBe, a Python tool for the computation of neutrinoless double beta decay (0νββ) rates in terms of lepton-number-violating operators in the Standard Model Effective Field Theory (SMEFT). The tool can be used for automated calculations of 0νββrates, electron spectra and angular correlations for all isotopes of experimental interest, for lepton-number-violating operators up to and including dimension 9. The tool takes care of renormalization-group running to lower energies and provides the matching to the low-energy effective field theory and, at lower scales, to a chiral effective field theory description of 0νββrates. The user can specify different sets of nuclear matrix elements from various many-body methods and hadronic low-energy constants. The tool can be used to quickly generate analytical and numerical expressions for 0νββrates and to generate a large variety of plots. In this work, we provide examples of possible use along with a detailed code documentation. The code can be accessed through: GitHub:https://github.com/OScholer/nudobe Online User-Interface:https://oscholer-nudobe-streamlit-4foz22.streamlit.app/ 
    more » « less