skip to main content


Title: Population Genomics Training for the Next Generation of Conservation Geneticists: ConGen 2018 Workshop
Abstract The increasing availability and complexity of next-generation sequencing (NGS) data sets make ongoing training an essential component of conservation and population genetics research. A workshop entitled “ConGen 2018” was recently held to train researchers in conceptual and practical aspects of NGS data production and analysis for conservation and ecological applications. Sixteen instructors provided helpful lectures, discussions, and hands-on exercises regarding how to plan, produce, and analyze data for many important research questions. Lecture topics ranged from understanding probabilistic (e.g., Bayesian) genotype calling to the detection of local adaptation signatures from genomic, transcriptomic, and epigenomic data. We report on progress in addressing central questions of conservation genomics, advances in NGS data analysis, the potential for genomic tools to assess adaptive capacity, and strategies for training the next generation of conservation genomicists.  more » « less
Award ID(s):
1655809 1639014
PAR ID:
10185993
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Heredity
Volume:
111
Issue:
2
ISSN:
0022-1503
Page Range / eLocation ID:
227 to 236
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    New computational methods and next‐generation sequencing (NGS) approaches have enabled the use of thousands or hundreds of thousands of genetic markers to address previously intractable questions. The methods and massive marker sets present both new data analysis challenges and opportunities to visualize, understand, and apply population and conservation genomic data in novel ways. The large scale and complexity of NGS data also increases the expertise and effort required to thoroughly and thoughtfully analyze and interpret data. To aid in this endeavor, a recent workshop entitled “Population Genomic Data Analysis,” also known as “ConGen 2017,” was held at the University of Montana. The ConGen workshop brought 15 instructors together with knowledge in a wide range of topics including NGS data filtering, genome assembly, genomic monitoring of effective population size, migration modeling, detecting adaptive genomic variation, genomewide association analysis, inbreeding depression, and landscape genomics. Here, we summarize the major themes of the workshop and the important take‐home points that were offered to students throughout. We emphasize increasing participation by women in population and conservation genomics as a vital step for the advancement of science. Some important themes that emerged during the workshop included the need for data visualization and its importance in finding problematic data, the effects of data filtering choices on downstream population genomic analyses, the increasing availability of whole‐genome sequencing, and the new challenges it presents. Our goal here is to help motivate and educate a worldwide audience to improve population genomic data analysis and interpretation, and thereby advance the contribution of genomics to molecular ecology, evolutionary biology, and especially to the conservation of biodiversity.

     
    more » « less
  2. Abstract

    Next Generation Sequencing (NGS) has become an important tool in the biological sciences and has a growing number of applications across medical fields. Currently, few undergraduate programs provide training in the design and implementation of NGS applications. Here, we describe an inquiry‐based laboratory exercise for a college‐level molecular biology laboratory course that uses real‐time MinION deep sequencing and bioinformatics to investigate characteristic genetic variants found in cancer cell‐lines. The overall goal for students was to identify non‐small cell lung cancer (NSCLC) cell‐lines based on their unique genomic profiles. The units described in this laboratory highlight core principles in multiplex PCR primer design, real‐time deep sequencing, and bioinformatics analysis for genetic variants. We found that the MinION device is an appropriate, feasible tool that provides a comprehensive, hands‐on NGS experience for undergraduates. Student evaluations demonstrated increased confidence in using molecular techniques and enhanced understanding of NGS concepts. Overall, this exercise provides a pedagogical tool for incorporating NGS approaches in the teaching laboratory as way of enhancing students' comprehension of genomic sequence analysis. Further, this NGS lab module can easily be added to a variety of lab‐based courses to help undergraduate students learn current DNA sequencing methods with limited effort and cost.

     
    more » « less
  3. null (Ed.)
    Abstract Background Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementary information regarding the location of genomic features such as enhancers. Results NGS-Integrator is a Java-based command line application, facilitating efficient integration of multiple genome-wide NGS datasets. NGS-Integrator first transforms all input data tracks using the complement of the minimum Bayes’ factor so that all values are expressed in the range [0,1] representing the probability of a true signal given the background noise. Then, NGS-Integrator calculates the joint probability for every genomic position to create an integrated track. We provide examples using real NGS data generated in our laboratory and from the mouse ENCODE database. Conclusions Our results show that NGS-Integrator is both time- and memory-efficient. Our examples show that NGS-Integrator can integrate information to facilitate downstream analyses that identify functional regulatory domains along the genome. 
    more » « less
  4. Abstract Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS). However, translating NGS data faces several challenges. First, NGS count data are often overdispersed, requiring appropriate modeling. Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity. The key question is whether we may integrate available data from all different sources or domains to achieve reproducible disease prognosis based on NGS count data. In this paper, we develop a Bayesian Multi-Domain Learning (BMDL) model that derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small. Experimental results from both our simulated and NGS datasets from The Cancer Genome Atlas (TCGA) demonstrate the promising potential of BMDL for effective multi-domain learning without negative transfer effects often seen in existing multi-task learning and transfer learning methods. 
    more » « less
  5. Background

    Markov chains (MC) have been widely used to model molecular sequences. The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensively studied in the past decades. In next generation sequencing (NGS), a large amount of short reads are generated. These short reads can overlap and some regions of the genome may not be sequenced resulting in a new type of data. Based on NGS data, the transition probabilities of MC can be estimated by moment estimators. However, the classical asymptotic distribution theory for MC transition probability estimators based on long sequences is no longer valid.

    Methods

    In this study, we present the asymptotic distributions of several statistics related to MC based on NGS data. We show that, after scaling by the effective coverageddefined in a previous study by the authors, these statistics based on NGS data approximate to the same distributions as the corresponding statistics for long sequences.

    Results

    We apply the asymptotic properties of these statistics for finding the theoretical confidence regions for MC transition probabilities based on NGS short reads data. We validate our theoretical confidence intervals using both simulated data and real data sets, and compare the results with those by the parametric bootstrap method.

    Conclusions

    We find that the asymptotic distributions of these statistics and the theoretical confidence intervals of transition probabilities based on NGS data given in this study are highly accurate, providing a powerful tool for NGS data analysis.

     
    more » « less