Using ChatGPT as a tool for training nonprogrammers to generate genomic sequence analysis code

Delcher, Haley A; Alsatari, Enas S; Haastrup, Adeyeye I; Naaz, Sayema; Hayes‐Guastella, Lydia A; McDaniel, Autumn M; Clark, Olivia G; Katerski, Devin M; Prinsloo, Francois O; Roberts, Olivia R; Shaddix, Meredith A; Sullivan, Bridgette N; Swan, Isabella M; Hartsell, Emily M; DeMeis, Jeffrey D; Paudel, Sunita S; Borchert, Glen M

doi:10.1002/bmb.21899

Citation Details

Using ChatGPT as a tool for training nonprogrammers to generate genomic sequence analysis code

Abstract Today, due to the size of many genomes and the increasingly large sizes of sequencing files, independently analyzing sequencing data is largely impossible for a biologist with little to no programming expertise. As such, biologists are typically faced with the dilemma of either having to spend a significant amount of time and effort to learn how to program themselves or having to identify (and rely on) an available computer scientist to analyze large sequence data sets. That said, the advent of AI‐powered programs like ChatGPT may offer a means of circumventing the disconnect between biologists and their analysis of genomic data critically important to their field. The work detailed herein demonstrates how implementing ChatGPT into an existing Course‐based Undergraduate Research Experience curriculum can provide a means for equipping biology students with no programming expertise the power to generate their own programs and allow those students to carry out a publishable, comprehensive analysis of real‐world Next Generation Sequencing (NGS) datasets. Relying solely on the students' biology background as a prompt for directing ChatGPT to generate Python codes, we found students could readily generate programs able to deal with and analyze NGS datasets greater than 10 gigabytes. In summary, we believe that integrating ChatGPT into education can help bridge a critical gap between biology and computer science and may prove similarly beneficial in other disciplines. Additionally, ChatGPT can provide biological researchers with powerful new tools capable of mediating NGS dataset analysis to help accelerate major new advances in the field. more »

Award ID(s):: 2219900 2243532

PAR ID:: 10613933

Author(s) / Creator(s):: Delcher, Haley A; Alsatari, Enas S; Haastrup, Adeyeye I; Naaz, Sayema; Hayes‐Guastella, Lydia A; McDaniel, Autumn M; Clark, Olivia G; Katerski, Devin M; Prinsloo, Francois O; Roberts, Olivia R; Shaddix, Meredith A; Sullivan, Bridgette N; Swan, Isabella M; Hartsell, Emily M; DeMeis, Jeffrey D; Paudel, Sunita S; Borchert, Glen M

Publisher / Repository:: Wiley Periodicals LLC

Date Published:: 2025-05-05

Journal Name:: Biochemistry and Molecular Biology Education

ISSN:: 1470-8175

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1002/bmb.21899

More Like this