Title: Pangenomic genotyping with the marker array
Abstract
We present a new method and software tool called that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool available athttps://github.com/alshai/rowbowt.
The pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species.
Results
Here we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations.
Conclusions
The PGV software can be installed via conda or downloaded fromhttps://github.com/ucrbioinfo/PGV. The companion PGV browser athttp://pgv.cs.ucr.educan be tested using example bed tracks available from the GitHub page.
Watowich, Marina M.; Chiou, Kenneth L.; Graves, Brian; Montague, Michael J.; Brent, Lauren J. N.; Higham, James P.; Horvath, Julie E.; Lu, Amy; Martinez, Melween I.; Platt, Michael L.; et al(
, Molecular Ecology Resources)
Abstract
Monitoring genetic diversity in wild populations is a central goal of ecological and evolutionary genetics and is critical for conservation biology. However, genetic studies of nonmodel organisms generally lack access to species‐specific genotyping methods (e.g. array‐based genotyping) and must instead use sequencing‐based approaches. Although costs are decreasing, high‐coverage whole‐genome sequencing (WGS), which produces the highest confidence genotypes, remains expensive. More economical reduced representation sequencing approaches fail to capture much of the genome, which can hinder downstream inference. Low‐coverage WGS combined with imputation using a high‐confidence reference panel is a cost‐effective alternative, but the accuracy of genotyping using low‐coverage WGS and imputation in nonmodel populations is still largely uncharacterized. Here, we empirically tested the accuracy of low‐coverage sequencing (0.1–10×) and imputation in two natural populations, one with a large (n = 741) reference panel, rhesus macaques (Macaca mulatta), and one with a smaller (n = 68) reference panel, gelada monkeys (Theropithecus gelada). Using samples sequenced to coverage as low as 0.5×, we could impute genotypes at >95% of the sites in the reference panel with high accuracy (medianr2 ≥ 0.92). We show that low‐coverage imputed genotypes can reliably calculate genetic relatedness and population structure. Based on these data, we also provide best practices and recommendations for researchers who wish to deploy this approach in other populations, with all code available on GitHub (https://github.com/mwatowich/LoCSI‐for‐non‐model‐species). Our results endorse accurate and effective genotype imputation from low‐coverage sequencing, enabling the cost‐effective generation of population‐scale genetic datasets necessary for tackling many pressing challenges of wildlife conservation.
Ashtari Esfahani, A.; Böser, S.; Buzinsky, N.; Cervantes, R.; Claessens, C.; Viveiros, L. de; Fertl, M.; Formaggio, J. A.; Gladstone, L.; Guigue, M.; et al(
, New Journal of Physics)
Abstract
The Locust simulation package is a new C++ software tool developed to simulate the measurement of time-varying electromagnetic fields using RF detection techniques. Modularity and flexibility allow for arbitrary input signals, while concurrently supporting tight integration with physics-based simulations as input. External signals driven by the Kassiopeia particle tracking package are discussed, demonstrating conditional feedback between Locust and Kassiopeia during software execution. An application of the simulation to the Project 8 experiment is described. Locust is publicly available athttps://github.com/project8/locust_mc.
Differential correlation networks are increasingly used to delineate changes in interactions among biomolecules. They characterize differences between omics networks under two different conditions, and can be used to delineate mechanisms of disease initiation and progression.
Results
We present a new R package, , that facilitates the estimation and visualization of differential correlation networks using multiple correlation measures and inference methods. The software is implemented in , and , and is available athttps://github.com/sqyu/CorDiffViz. Visualization has been tested for the Chrome and Firefox web browsers. A demo is available athttps://diffcornet.github.io/CorDiffViz/demo.html.
Conclusions
Our software offers considerable flexibility by allowing the user to interact with the visualization and choose from different estimation methods and visualizations. It also allows the user to easily toggle between correlation networks for samples under one condition and differential correlations between samples under two conditions. Moreover, the software facilitates integrative analysis of cross-correlation networks between two omics data sets.
Huang, Xin; Struck, Travis J; Davey, Sean W; Gutenkunst, Ryan N(
, bioRxiv)
AbstractSummary
dadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing.
Availability and Implementation
dadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available athttps://github.com/xin-huang/dadi-cli. dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2https://cacao.jetstream-cloud.org/.
Mun, Taher, Vaddadi, Naga Sai Kavya, and Langmead, Ben. Pangenomic genotyping with the marker array. Algorithms for Molecular Biology 18.1 Web. doi:10.1186/s13015-023-00225-3.
@article{osti_10411750,
place = {Country unknown/Code not available},
title = {Pangenomic genotyping with the marker array},
url = {https://par.nsf.gov/biblio/10411750},
DOI = {10.1186/s13015-023-00225-3},
abstractNote = {Abstract We present a new method and software tool called that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool available athttps://github.com/alshai/rowbowt.},
journal = {Algorithms for Molecular Biology},
volume = {18},
number = {1},
publisher = {Springer Science + Business Media},
author = {Mun, Taher and Vaddadi, Naga Sai Kavya and Langmead, Ben},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.