Search for: All records

Creators/Authors contains: "Schatz, Michael C."

« Prev Next »

Total Resources

41

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

41

Workshop Report

0

Availability
Full Text / Resource Available

39

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fast and accurate genome-wide predictions and structural modeling of protein–protein interactions using Galaxy

https://doi.org/10.1186/s12859-023-05389-8

Guerler, Aysam ; Baker, Dannon ; van den Beek, Marius ; Gruening, Bjoern ; Bouvier, Dave ; Coraor, Nate ; Shank, Stephen D. ; Zehr, Jordan D. ; Schatz, Michael C. ; Nekrutenko, Anton ( December 2023 , BMC Bioinformatics)

Abstract Background
Protein–protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein–protein interactions and produce high-quality multimeric structural models.
Results
Application of our method to the Human and Yeast genomes yield protein–protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2’s non-structural protein 3. We also produced models of SARS-CoV2’s spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4.
Conclusions
The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.

more » « less
Free, publicly-accessible full text available December 1, 2024
Sketching and sampling approaches for fast and accurate long read classification

https://doi.org/10.1186/s12859-022-05014-0

Das, Arun ; Schatz, Michael C. ( December 2022 , BMC Bioinformatics)

Abstract Background In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. Results Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a “screen”) of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read’s similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. Conclusions The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
more » « less
Full Text Available
Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy

https://doi.org/10.1038/s41587-023-02100-3

Larivière, Delphine ; Abueg, Linelle ; Brajuka, Nadolina ; Gallardo-Alba, Cristóbal ; Grüning, Bjorn ; Ko, Byung June ; Ostrovsky, Alex ; Palmada-Flores, Marc ; Pickett, Brandon D. ; Rabbani, Keon ; et al ( March 2024 , Nature Biotechnology)

Free, publicly-accessible full text available March 1, 2025
Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing

https://doi.org/10.1038/s41592-022-01716-8

Kovaka, Sam ; Ou, Shujun ; Jenike, Katharine M. ; Schatz, Michael C. ( January 2023 , Nature Methods)

Full Text Available
Jasmine and Iris: population-scale structural variant comparison and analysis

https://doi.org/10.1038/s41592-022-01753-3

Kirsche, Melanie ; Prabhu, Gautam ; Sherman, Rachel ; Ni, Bohan ; Battle, Alexis ; Aganezov, Sergey ; Schatz, Michael C. ( January 2023 , Nature Methods)

Full Text Available
Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing

https://doi.org/10.1186/s13059-022-02823-7

Alonge, Michael ; Lebeigle, Ludivine ; Kirsche, Melanie ; Jenike, Katie ; Ou, Shujun ; Aganezov, Sergey ; Wang, Xingang ; Lippman, Zachary B. ; Schatz, Michael C. ; Soyk, Sebastian ( December 2022 , Genome Biology)

Abstract
Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.

more » « less
Establishing Physalis as a Solanaceae model system enables genetic reevaluation of the inflated calyx syndrome

https://doi.org/10.1093/plcell/koac305

He, Jia ; Alonge, Michael ; Ramakrishnan, Srividya ; Benoit, Matthias ; Soyk, Sebastian ; Reem, Nathan T ; Hendelman, Anat ; Van Eck, Joyce ; Schatz, Michael C ; Lippman, Zachary B ( October 2022 , The Plant Cell)

Abstract The highly diverse Solanaceae family contains several widely studied models and crop species. Fully exploring, appreciating, and exploiting this diversity requires additional model systems. Particularly promising are orphan fruit crops in the genus Physalis, which occupy a key evolutionary position in the Solanaceae and capture understudied variation in traits such as inflorescence complexity, fruit ripening and metabolites, disease and insect resistance, self-compatibility, and most notable, the striking inflated calyx syndrome (ICS), an evolutionary novelty found across angiosperms where sepals grow exceptionally large to encapsulate fruits in a protective husk. We recently developed transformation and genome editing in Physalis grisea (groundcherry). However, to systematically explore and unlock the potential of this and related Physalis as genetic systems, high-quality genome assemblies are needed. Here, we present chromosome-scale references for P. grisea and its close relative Physalis pruinosa and use these resources to study natural and engineered variations in floral traits. We first rapidly identified a natural structural variant in a bHLH gene that causes petal color variation. Further, and against expectations, we found that CRISPR–Cas9-targeted mutagenesis of 11 MADS-box genes, including purported essential regulators of ICS, had no effect on inflation. In a forward genetics screen, we identified huskless, which lacks ICS due to mutation of an AP2-like gene that causes sepals and petals to merge into a single whorl of mixed identity. These resources and findings elevate Physalis to a new Solanaceae model system and establish a paradigm in the search for factors driving ICS.
more » « less
Full Text Available
Correction to ‘The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update’

https://doi.org/10.1093/nar/gkac610

Afgan, Enis ; Nekrutenko, Anton ; Grüning, Bjórn A. ; Blankenberg, Daniel ; Goecks, Jeremy ; Schatz, Michael C. ; Ostrovsky, Alexander E. ; et al. ( July 2022 , Nucleic Acids Research)

Full Text Available
Complete Sequence of a 641-kb Insertion of Mitochondrial DNA in the Arabidopsis thaliana Nuclear Genome

https://doi.org/10.1093/gbe/evac059

Fields, Peter D. ; Waneka, Gus ; Naish, Matthew ; Schatz, Michael C. ; Henderson, Ian R. ; Sloan, Daniel B. ( May 2022 , Genome Biology and Evolution)
Slotte, Tanja (Ed.)
Abstract Intracellular transfers of mitochondrial DNA continue to shape nuclear genomes. Chromosome 2 of the model plant Arabidopsis thaliana contains one of the largest known nuclear insertions of mitochondrial DNA (numts). Estimated at over 600 kb in size, this numt is larger than the entire Arabidopsis mitochondrial genome. The primary Arabidopsis nuclear reference genome contains less than half of the numt because of its structural complexity and repetitiveness. Recent data sets generated with improved long-read sequencing technologies (PacBio HiFi) provide an opportunity to finally determine the accurate sequence and structure of this numt. We performed a de novo assembly using sequencing data from recent initiatives to span the Arabidopsis centromeres, producing a gap-free sequence of the Chromosome 2 numt, which is 641 kb in length and has 99.933% nucleotide sequence identity with the actual mitochondrial genome. The numt assembly is consistent with the repetitive structure previously predicted from fiber-based fluorescent in situ hybridization. Nanopore sequencing data indicate that the numt has high levels of cytosine methylation, helping to explain its biased spectrum of nucleotide sequence divergence and supporting previous inferences that it is transcriptionally inactive. The original numt insertion appears to have involved multiple mitochondrial DNA copies with alternative structures that subsequently underwent an additional duplication event within the nuclear genome. This work provides insights into numt evolution, addresses one of the last unresolved regions of the Arabidopsis reference genome, and represents a resource for distinguishing between highly similar numt and mitochondrial sequences in studies of transcription, epigenetic modifications, and de novo mutations.
more » « less
Full Text Available
Minos: variant adjudication and joint genotyping of cohorts of bacterial genomes

https://doi.org/10.1186/s13059-022-02714-x

Hunt, Martin ; Letcher, Brice ; Malone, Kerri M. ; Nguyen, Giang ; Hall, Michael B. ; Colquhoun, Rachel M. ; Lima, Leandro ; Schatz, Michael C. ; Ramakrishnan, Srividya ; CRyPTIC consortium ; et al ( July 2022 , Genome Biology)

Abstract
There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385Mycobacterium tuberculosissamples. Minos also enables joint genotyping; we demonstrate on a large (N=13k)M. tuberculosiscohort, building a map of non-synonymous SNPs and indels in a region where all such variants are assumed to cause rifampicin resistance. We quantify the correlation with phenotypic resistance and then replicate in a second cohort (N=10k).

more » « less

« Prev Next »