skip to main content


Title: The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.  more » « less
Award ID(s):
2045343
NSF-PAR ID:
10390249
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
F1000Research
Volume:
11
ISSN:
2046-1402
Page Range / eLocation ID:
530
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. Results In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. Availability and implementation Meltos is available at https://github.com/ih-lab/Meltos. Contact imh2003@med.cornell.edu Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Undergraduate research experiences are increasingly important in biology education with efforts underway to provide more projects by embedded them in a course. The shift to online learning at the beginning of the pandemic presented a challenge. How could biology instructors provide research experiences to students who were unable to attend in-person labs? During the 2021 ISMB (Intelligent Systems for Molecular Biology) iCn3D Hackathon–Collaborative Tools for Protein Analysis–we learned about new capabilities in iCn3D for analyzing the interactions between amino acids in the paratopes of antibodies with amino acids in the epitopes of antigens and predicting the effects of mutations on binding. Additionally, new sequence alignment tools in iCn3D support aligning protein sequences with sequences in structure models. We used these methods to create a new undergraduate research project, that students could perform online as part of a course, by combining the use of new features in iCn3D with analysis tools in NextStrain, and a data set of anti-SARS-CoV-2 antibodies. We present results from an example project to illustrate how students would investigate the likelihood of SARS-CoV-2 variants escaping from commercial antibodies and use chemical interaction data to support their hypotheses. We also demonstrate that online tools (iCn3D, NextStrain, and the NCBI databases) can be used to carry out the necessary steps and that this work satisfies the requirements for course-based undergraduate research. This project reinforces major concepts in undergraduate biology–evolution and the relationship between the sequence of a protein, its three-dimensional structure, and its function. 
    more » « less
  3. Abstract

    Structural variants (SVs)—including duplications, deletions, and inversions of DNA—can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single‐nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well‐documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single‐nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever‐expanding SV compendium propelled by biotechnology advancements.

     
    more » « less
  4. Abstract Background

    De novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species.

    Findings

    Using Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements.

    Conclusions

    Knowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.

     
    more » « less
  5. Abstract

    Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long‐read (Oxford nanopore) whole‐genome sequencing and a hybrid zone between twoLycaeidesbutterfly taxa (L.melissaand Jackson HoleLycaeides) to comprehensively evaluate genome‐wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry‐informative SVs exhibiting genomic clines that deviated from null expectations based on genome‐average ancestry. Overall, hybrids exhibited a directional shift towards Jackson HoleLycaeidesancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average than SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess ancestry from Jackson HoleLycaeides. Excess Jackson HoleLycaeidesancestry in hybrids was also especially pronounced for Z‐linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might disproportionately affect hybrid fitness and thus contribute to reproductive isolation.

     
    more » « less