skip to main content


Title: Big Data in Conservation Genomics: Boosting Skills, Hedging Bets, and Staying Current in the Field
Abstract A current challenge in the fields of evolutionary, ecological, and conservation genomics is balancing production of large-scale datasets with additional training often required to handle such datasets. Thus, there is an increasing need for conservation geneticists to continually learn and train to stay up-to-date through avenues such as symposia, meetings, and workshops. The ConGen meeting is a near-annual workshop that strives to guide participants in understanding population genetics principles, study design, data processing, analysis, interpretation, and applications to real-world conservation issues. Each year of ConGen gathers a diverse set of instructors, students, and resulting lectures, hands-on sessions, and discussions. Here, we summarize key lessons learned from the 2019 meeting and more recent updates to the field with a focus on big data in conservation genomics. First, we highlight classical and contemporary issues in study design that are especially relevant to working with big datasets, including the intricacies of data filtering. We next emphasize the importance of building analytical skills and simulating data, and how these skills have applications within and outside of conservation genetics careers. We also highlight recent technological advances and novel applications to conservation of wild populations. Finally, we provide data and recommendations to support ongoing efforts by ConGen organizers and instructors—and beyond—to increase participation of underrepresented minorities in conservation and eco-evolutionary sciences. The future success of conservation genetics requires both continual training in handling big data and a diverse group of people and approaches to tackle key issues, including the global biodiversity-loss crisis.  more » « less
Award ID(s):
1639014
NSF-PAR ID:
10301257
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Editor(s):
Koepfli, Klaus-Peter
Date Published:
Journal Name:
Journal of Heredity
Volume:
112
Issue:
4
ISSN:
0022-1503
Page Range / eLocation ID:
313 to 327
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    With the rapid growth of modern technology, many biomedical studies are being conducted to collect massive datasets with volumes of multi‐modality imaging, genetic, neurocognitive and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function and brain‐related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention and treatment of numerous complex brain‐related disorders (e.g., schizophrenia and Alzheimer's disease). However, the development of analytical methods for the joint analysis of both high‐dimensional imaging phenotypes and high‐dimensional genetic data, a big data squared (BD2) problem, presents major computational and theoretical challenges for existing analytical methods. Besides the high‐dimensional nature of BD2, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for imaging genetics, including massive univariate and voxel‐wise approaches, reduced rank regression, mixture models and group sparse multi‐task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research.The Canadian Journal of Statistics47: 108–131; 2019 © 2019 Statistical Society of Canada

     
    more » « less
  2. Abstract The increasing availability and complexity of next-generation sequencing (NGS) data sets make ongoing training an essential component of conservation and population genetics research. A workshop entitled “ConGen 2018” was recently held to train researchers in conceptual and practical aspects of NGS data production and analysis for conservation and ecological applications. Sixteen instructors provided helpful lectures, discussions, and hands-on exercises regarding how to plan, produce, and analyze data for many important research questions. Lecture topics ranged from understanding probabilistic (e.g., Bayesian) genotype calling to the detection of local adaptation signatures from genomic, transcriptomic, and epigenomic data. We report on progress in addressing central questions of conservation genomics, advances in NGS data analysis, the potential for genomic tools to assess adaptive capacity, and strategies for training the next generation of conservation genomicists. 
    more » « less
  3. Advances in genomics and transcriptomics accompanying the rapid accumulation of omics data have provided new tools that have transformed and expanded the traditional concepts of model fungi. Evolutionary genomics and transcriptomics have flourished with the use of classical and newer fungal models that facilitate the study of diverse topics encompassing fungal biology and development. Technological advances have also created the opportunity to obtain and mine large datasets. One such continuously growing dataset is that of the Sordariomycetes, which exhibit a richness of species, ecological diversity, economic importance, and a profound research history on amenable models. Currently, 3,574 species of this class have been sequenced, comprising nearly one-third of the available ascomycete genomes. Among these genomes, multiple representatives of the model genera Fusarium , Neurospora , and Trichoderma are present. In this review, we examine recently published studies and data on the Sordariomycetes that have contributed novel insights to the field of fungal evolution via integrative analyses of the genetic, pathogenic, and other biological characteristics of the fungi. Some of these studies applied ancestral state analysis of gene expression among divergent lineages to infer regulatory network models, identify key genetic elements in fungal sexual development, and investigate the regulation of conidial germination and secondary metabolism. Such multispecies investigations address challenges in the study of fungal evolutionary genomics derived from studies that are often based on limited model genomes and that primarily focus on the aspects of biology driven by knowledge drawn from a few model species. Rapidly accumulating information and expanding capabilities for systems biological analysis of Big Data are setting the stage for the expansion of the concept of model systems from unitary taxonomic species/genera to inclusive clusters of well-studied models that can facilitate both the in-depth study of specific lineages and also investigation of trait diversity across lineages. The Sordariomycetes class, in particular, offers abundant omics data and a large and active global research community. As such, the Sordariomycetes can form a core omics clade, providing a blueprint for the expansion of our knowledge of evolution at the genomic scale in the exciting era of Big Data and artificial intelligence, and serving as a reference for the future analysis of different taxonomic levels within the fungal kingdom. 
    more » « less
  4. Abstract

    Pressing environmental research questions demand the integration of increasingly diverse and large‐scale ecological datasets as well as complex analytical methods, which require specialized tools and resources.

    Computational training for ecological and evolutionary sciences has become more abundant and accessible over the past decade, but tool development has outpaced the availability of specialized training. Most training for scripted analyses focuses on individual analysis steps in one script rather than creating a scripted pipeline, where modular functions comprise an ecosystem of interdependent steps. Although current computational training creates an excellent starting place, linear styles of scripting can risk becoming labor‐ and time‐intensive and less reproducible by often requiring manual execution. Pipelines, however, can be easily automated or tracked by software to increase efficiency and reduce potential errors. Ecology and evolution would benefit from techniques that reduce these risks by managing analytical pipelines in a modular, readily parallelizable format with clear documentation of dependencies.

    Workflow management software (WMS) can aid in the reproducibility, intelligibility and computational efficiency of complex pipelines. To date, WMS adoption in ecology and evolutionary research has been slow. We discuss the benefits and challenges of implementing WMS and illustrate its use through a case study with thetargets rpackage to further highlight WMS benefits through workflow automation, dependency tracking and improved clarity for reviewers.

    Although WMS requires familiarity with function‐oriented programming and careful planning for more advanced applications and pipeline sharing, investment in training will enable access to the benefits of WMS and impart transferable computing skills that can facilitate ecological and evolutionary data science at large scales.

     
    more » « less
  5. null (Ed.)
    This design-focused practice paper presents a case study describing how a training program developed for academic contexts was adapted for use with engineers working in industry. The underlying curriculum is from the NSF-funded CyberAmbassadors program, which developed training in communication, teamwork and leadership skills for participants from academic and research settings. For the case study described here, one module from the CyberAmbassadors project was adapted for engineers working in private industry: “Teaming Up: Effective Group and Meeting Management.” The key objectives were to increase knowledge and practical skills within the company’s engineering organization, focusing specifically on time management as it relates to project and product delivery. We were also interested in examining the results of translating curricula designed for an academic setting into a corporate setting. Training participants were all from the dedicated engineering department of a US-based location of an international company that provides financial services. The original curriculum was designed for live, in-person training, but was adapted for virtual delivery after the company adopted a 100% remote workforce in response to the COVID-19 pandemic. The training was conducted in four phases: (1) train-the-trainer to create internal evangelists; (2) train management to build buy-in and provide sponsorship; (3) phased rollout of training to individual members of the engineering department, contemporaneous with (4) specific and intentional opportunities to apply the skills in normal business activities including Joint Architecture Design (JAD) sessions. Effectiveness was measured through surveys at the engineering management level (before, during, and after training), and through direct discussions with engineering teams who were tracked for four weeks after the training. A number of cultural shifts within the company were observed as direct and indirect outcomes of this training. These include the creation and standardization of a template for meeting agendas; a “grassroots” effort to spread the knowledge and best practices from trained individuals to untrained individuals through informal, peer-to-peer interactions; individuals at varying levels of company hierarchy publicly expressing that they would not attend meetings unless an appropriate agenda was provided in advance; and requests for additional training by management who wanted to increase performance in their employees. As a result of this adaptation from academic to industry training contexts, several key curricular innovations were added back to the original CyberAmbassadors corpus. Examples include a reinterpretation of the separate-but-equal leadership roles within meetings, and the elevation of timekeeper to a controlling leadership role within a meeting. This case study offers valuable lessons on translating training from academic/research settings to industry, including a description of how the “business case” was developed in order to gain approval for the training and sponsorship from management. Future work includes adapting additional material from the CyberAmbassadors program for applications in a business context, and the continued formal and informal propagation of the current material within the company. 
    more » « less