skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Teaching computational genomics and bioinformatics on a high performance computing cluster—a primer
Abstract The burgeoning field of genomics as applied to personalized medicine, epidemiology, conservation, agriculture, forensics, drug development, and other fields comes with large computational and bioinformatics costs, which are often inaccessible to student trainees in classroom settings at universities. However, with increased availability of resources such as NSF XSEDE, Google Cloud, Amazon AWS, and other high-performance computing (HPC) clouds and clusters for educational purposes, a growing community of academicians are working on teaching the utility of HPC resources in genomics and big data analyses. Here, I describe the successful implementation of a semester-long (16 week) upper division undergraduate/graduate level course in Computational Genomics and Bioinformatics taught at San Diego State University in Spring 2022. Students were trained in the theory, algorithms and hands-on applications of genomic data quality control, assembly, annotation, multiple sequence alignment, variant calling, phylogenomic analyses, population genomics, genome-wide association studies, and differential gene expression analyses using RNAseq data on their own dedicated 6-CPU NSF XSEDE Jetstream virtual machines. All lesson plans, activities, examinations, tutorials, code, lectures, and notes are publicly available at https://github.com/arunsethuraman/biomi609spring2022.  more » « less
Award ID(s):
2147812
PAR ID:
10502092
Author(s) / Creator(s):
Publisher / Repository:
OUP
Date Published:
Journal Name:
Biology Methods and Protocols
Volume:
7
Issue:
1
ISSN:
2396-8923
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Yoshizawa, Go (Ed.)
    PurposeThe purpose of this article is to investigate particular aspects of the STEM job market in the US. In particular, we ask: could the possession of high performance computing (HPC) skills enhance the chances of a person getting a job and/or increase starting salaries for people receiving an undergraduate or graduate degree and entering the technical workforce (rather than academia)? We also estimate the value to the US economy of practical experience offered to US students through training about HPC and the opportunity to use HPC systems funded by the National Science Foundation (NSF) and accessible nationally. MethodsInterviews and surveys of employers of graduates in STEM fields were used to gauge demand for STEM graduates with practical HPC experience and the salary increase that can be associated with the possession of such skills. We used data from the XSEDE project to determine how many undergraduate and graduate students it enabled to acquire practical proficiency with HPC. ResultsPeople with such skills who had completed an undergraduate or graduate degree received an initial median hiring salary of approximately 7%–15% more than those with the same degrees who did not possess such skills. XSEDE added approximately $10 million or more per year to the US economy through the practical educational opportunities it offered. DiscussionPractical hands-on experience provided by the US federal government, as well as many universities and colleges in the US, holds value for students as they enter the workforce. ConclusionPractical training in HPC during the course of undergraduate and graduate programs has the potential to produce positive individual labor market outcomes (i.e., salary boosts, signing bonuses) as well as to help address the shortage of STEM workers in the private sector of the US. 
    more » « less
  2. Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform. 
    more » « less
  3. Abstract With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results.clubberis our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems.clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We usedclubberto speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance ofclubberin the everyday computational biology environment. 
    more » « less
  4. Abstract MotivationAcross biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. ResultsHere, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementationData utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. Open OnDemand (openondemand.org) is an NSF-funded open-source HPC platform currently in use at over 200 HPC centers around the world. It is an intuitive, innovative, and interactive interface to remote computing resources. Open OnDemand (OOD) helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device. It helps computer center staff support a wide range of clients by simplifying the user interface and experience. 
    more » « less