skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Teaching computational genomics and bioinformatics on a high performance computing cluster—a primer
Abstract The burgeoning field of genomics as applied to personalized medicine, epidemiology, conservation, agriculture, forensics, drug development, and other fields comes with large computational and bioinformatics costs, which are often inaccessible to student trainees in classroom settings at universities. However, with increased availability of resources such as NSF XSEDE, Google Cloud, Amazon AWS, and other high-performance computing (HPC) clouds and clusters for educational purposes, a growing community of academicians are working on teaching the utility of HPC resources in genomics and big data analyses. Here, I describe the successful implementation of a semester-long (16 week) upper division undergraduate/graduate level course in Computational Genomics and Bioinformatics taught at San Diego State University in Spring 2022. Students were trained in the theory, algorithms and hands-on applications of genomic data quality control, assembly, annotation, multiple sequence alignment, variant calling, phylogenomic analyses, population genomics, genome-wide association studies, and differential gene expression analyses using RNAseq data on their own dedicated 6-CPU NSF XSEDE Jetstream virtual machines. All lesson plans, activities, examinations, tutorials, code, lectures, and notes are publicly available at https://github.com/arunsethuraman/biomi609spring2022.  more » « less
Award ID(s):
2147812
PAR ID:
10502092
Author(s) / Creator(s):
Publisher / Repository:
OUP
Date Published:
Journal Name:
Biology Methods and Protocols
Volume:
7
Issue:
1
ISSN:
2396-8923
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Yoshizawa, Go (Ed.)
    PurposeThe purpose of this article is to investigate particular aspects of the STEM job market in the US. In particular, we ask: could the possession of high performance computing (HPC) skills enhance the chances of a person getting a job and/or increase starting salaries for people receiving an undergraduate or graduate degree and entering the technical workforce (rather than academia)? We also estimate the value to the US economy of practical experience offered to US students through training about HPC and the opportunity to use HPC systems funded by the National Science Foundation (NSF) and accessible nationally. MethodsInterviews and surveys of employers of graduates in STEM fields were used to gauge demand for STEM graduates with practical HPC experience and the salary increase that can be associated with the possession of such skills. We used data from the XSEDE project to determine how many undergraduate and graduate students it enabled to acquire practical proficiency with HPC. ResultsPeople with such skills who had completed an undergraduate or graduate degree received an initial median hiring salary of approximately 7%–15% more than those with the same degrees who did not possess such skills. XSEDE added approximately $10 million or more per year to the US economy through the practical educational opportunities it offered. DiscussionPractical hands-on experience provided by the US federal government, as well as many universities and colleges in the US, holds value for students as they enter the workforce. ConclusionPractical training in HPC during the course of undergraduate and graduate programs has the potential to produce positive individual labor market outcomes (i.e., salary boosts, signing bonuses) as well as to help address the shortage of STEM workers in the private sector of the US. 
    more » « less
  2. Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform. 
    more » « less
  3. Abstract With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results.clubberis our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems.clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We usedclubberto speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance ofclubberin the everyday computational biology environment. 
    more » « less
  4. Abstract MotivationAcross biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. ResultsHere, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementationData utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. Liao, Min-Ken (Ed.)
    ABSTRACT The Genomics Education Partnership (GEP;thegep.org) is a collaboration of more than 260 faculty from over 200 colleges and universities across the continental United States and Puerto Rico, all of whom are engaged in bringing Course-based Undergraduate Research Experiences (CUREs) centered in genomics and bioinformatics to their students. The purpose of the GEP-CURE is to ensure all undergraduate students have access to research experiences in genomics, regardless of the funding and resources available at their institutions. The GEP community provides many resources to facilitate implementation of the genomics curriculum at collaborating institutions, including extensive support for both faculty and undergraduate students. Faculty receive training to implement the curriculum, ongoing professional development, access to updated curriculum, and a community of practitioners. During the COVID-19 pandemic, the GEP developed a virtual learning assistant (LA) program to provide real-time support in GEP activities and research to all students, regardless of their institution, while they were participating in the GEP-CURE. A mixed-methods descriptive study was conducted about this program and draws from quantitative data gathered about the scope and use of the program, as well as the value of the program, as indicated by the undergraduates themselves from their post-course survey responses. Additionally, seven LAs who served in this role between 2021 and 2023 participated in interviews to help the GEP better understand how this resource was used by GEP students, the needs of the students, and to identify the conditions in which this resource could be replicated in other courses. 
    more » « less