skip to main content


Title: A guided‐inquiry investigation of genetic variants using Oxford nanopore sequencing for an undergraduate molecular biology laboratory course
Abstract

Next Generation Sequencing (NGS) has become an important tool in the biological sciences and has a growing number of applications across medical fields. Currently, few undergraduate programs provide training in the design and implementation of NGS applications. Here, we describe an inquiry‐based laboratory exercise for a college‐level molecular biology laboratory course that uses real‐time MinION deep sequencing and bioinformatics to investigate characteristic genetic variants found in cancer cell‐lines. The overall goal for students was to identify non‐small cell lung cancer (NSCLC) cell‐lines based on their unique genomic profiles. The units described in this laboratory highlight core principles in multiplex PCR primer design, real‐time deep sequencing, and bioinformatics analysis for genetic variants. We found that the MinION device is an appropriate, feasible tool that provides a comprehensive, hands‐on NGS experience for undergraduates. Student evaluations demonstrated increased confidence in using molecular techniques and enhanced understanding of NGS concepts. Overall, this exercise provides a pedagogical tool for incorporating NGS approaches in the teaching laboratory as way of enhancing students' comprehension of genomic sequence analysis. Further, this NGS lab module can easily be added to a variety of lab‐based courses to help undergraduate students learn current DNA sequencing methods with limited effort and cost.

 
more » « less
NSF-PAR ID:
10387684
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biochemistry and Molecular Biology Education
Volume:
49
Issue:
4
ISSN:
1470-8175
Page Range / eLocation ID:
p. 588-597
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The development of next-generation sequencing (NGS) enabled a shift from array-based genotyping to directly sequencing genomic libraries for high-throughput genotyping. Even though whole-genome sequencing was initially too costly for routine analysis in large populations such as breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to capitalize on whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage to genotype various breeding populations, a limitation comes in the time and cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on the amount of data required, and could be extended to 3,072 samples or more. Panels of doubled haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T.aestivumxHordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1 × down to 0.01 × per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.

     
    more » « less
  2. Abstract

    Base‐editing technologies enable the introduction of point mutations at targeted genomic sites in mammalian cells, with higher efficiency and precision than traditional genome‐editing methods that use DNA double‐strand breaks, such as zinc finger nucleases (ZFNs), transcription‐activator‐like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR‐associated protein 9 (CRISPR‐Cas9) system. This allows the generation of single‐nucleotide‐variant isogenic cell lines (i.e., cell lines whose genomic sequences differ from each other only at a single, edited nucleotide) in a more time‐ and resource‐effective manner. These single‐nucleotide‐variant clonal cell lines represent a powerful tool with which to assess the functional role of genetic variants in a native cellular context. Base editing can therefore facilitate genotype‐to‐phenotype studies in a controlled laboratory setting, with applications in both basic research and clinical applications. Here, we provide optimized protocols (including experimental design, methods, and analyses) to design base‐editing constructs, transfect adherent cells, quantify base‐editing efficiencies in bulk, and generate single‐nucleotide‐variant clonal cell lines. © 2020 Wiley Periodicals LLC.

    Basic Protocol 1: Design and production of plasmids for base‐editing experiments

    Basic Protocol 2: Transfection of adherent cells and harvesting of genomic DNA

    Basic Protocol 3: Genotyping of harvested cells using Sanger sequencing

    Alternate Protocol 1: Next‐generation sequencing to quantify base editing

    Basic Protocol 4: Single‐cell isolation of base‐edited cells using FACS

    Alternate Protocol 2: Single‐cell isolation of base‐edited cells using dilution plating

    Basic Protocol 5: Clonal expansion to generate isogenic cell lines and genotyping of clones

     
    more » « less
  3. Genomics has grown exponentially over the last decade. Common variants are associated with physiological changes through statistical strategies such as Genome-Wide Association Studies (GWAS) and quantitative trail loci (QTL). Rare variants are associated with diseases through extensive filtering tools, including population genomics and trio-based sequencing (parents and probands). However, the genomic associations require follow-up analyses to narrow causal variants, identify genes that are influenced, and to determine the physiological changes. Large quantities of data exist that can be used to connect variants to gene changes, cell types, protein pathways, clinical phenotypes, and animal models that establish physiological genomics. This data combined with bioinformatics including evolutionary analysis, structural insights, and gene regulation can yield testable hypotheses for mechanisms of genomic variants. Molecular biology, biochemistry, cell culture, CRISPR editing, and animal models can test the hypotheses to give molecular variant mechanisms. Variant characterizations can be a significant component of educating future professionals at the undergraduate, graduate, or medical training programs through teaching the basic concepts and terminology of genetics while learning independent research hypothesis design. This article goes through the computational and experimental analysis strategies of variant characterization and provides examples of these tools applied in publications. © 2022 American Physiological Society. Compr Physiol 12:3303-3336, 2022. 
    more » « less
  4. Abstract Motivation We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. Results In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. Availability and implementation Meltos is available at https://github.com/ih-lab/Meltos. Contact imh2003@med.cornell.edu Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. Abstract Motivation

    Accurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction.

    Results

    In the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available.

    Availability and implementation

    Source code is available at: https://github.com/CompbioLabUcf/IntMTQ.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less