skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Gupta, Vibhuti"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Cancer is one of the leading causes of death world- wide. Pathogenic viruses are estimated to be responsible for 15% of all human cancers globally and pose significant threats to pub- lic health. Viruses integrate their genetic material into the host genome, increasing the risk of cancer promoting changes in it. To understand the molecular mechanisms of virus-mediated cancers, it is crucial to identify viral insertion sites in cancer genomes. However, this effort is hindered by the rapidly increasing volume of tumor sequencing data, along with the challenges of accurate data analysis caused by high viral mutation rates and the difficulty of aligning short reads to the reference genome. Thus it is crucial to develop an efficient method for virus integration site detection in tumor genomes. This paper proposes a novel pipeline to identify viral integration sites leveraging deep Convolutional Neural Networks (CNN). Our contributions are twofold: (i) We propose and integrate three novel matrix generation methods into the pipeline, developed after aligning the host and viral genomes with their respective reference genomes.; (ii) We employ one-hot encoded images with reduced computational complexity to represent viral integration sites and harness the capabilities of Deep CNN networks for detection. The paper illustrates our proposed approach and presents experiments conducted using both synthetic and real sequencing data. Our preliminary experimental results are promising, showcasing the effectiveness of the proposed methods in detecting viral integration sites. 
    more » « less
    Free, publicly-accessible full text available January 16, 2026
  2. In the United States, heart disease is the leading cause of death, killing about 695,000 people each year. Myocardial infarction (MI) is a cardiac complication which occurs when blood flow to a portion of the heart decreases or halts, leading to damage in the heart muscle. Heart failure and Atrial fibrillation (AF) are closely associated with MI. Heart failure is a common complication of MI and a risk factor for AF. Machine learning (ML) and deep learning techniques have shown potential in predicting cardiovascular conditions. However, developing a sim- plified predictive model, along with a thorough feature analysis, is challenging due to various factors, including lifestyle, age, family history, medical conditions, and clinical variables for cardiac complications prediction. This paper aims to develop simplified models with comprehensive feature analysis and data preprocessing for predicting cardiac complications, such as heart failure and atrial fibrillation linked with MI, using a publicly available dataset of myocardial infarction patients. This will help the students and health care professionals understand various factors responsible for cardiac complications through a simplified workflow. By prioritizing interpretability, this paper illustrates how simpler models, like decision trees and logistic regression, can provide transparent decision-making processes while still maintaining a balance with accuracy. Additionally, this paper examines how age-specific factors affect heart failure and atrial fibrillation conditions. Overall this research focuses on making machine learning accessible and interpretable. Its goal is to equip students and non-experts with practical tools to understand how ML can be applied in healthcare, particularly for the cardiac complications prediction for patients having MI. 
    more » « less
    Free, publicly-accessible full text available January 16, 2026
  3. RNA sequencing (RNA-seq) has emerged as a prominent resource for transcriptomic analysis due to its ability to measure gene expression in a highly sensitive and accurate manner. With the increasing availability of RNA-seq data analysis from clinical studies and patient samples, the development of effective visualization tools for RNA-seq analysis has become increasingly important to help clinicians and biomedical researchers better understand the complex patterns of gene expression associated with health and disease. This review aims to outline the current state-of-the-art data visualization techniques and tools commonly used to frame clinical inferences from RNA-seq data and point out their benefits, applications, and limitations. A systematic review of English articles using PubMed, Scopus, Web of Science, and IEEE Xplore databases was performed. Search terms included “RNA-seq”, “visualization”, “plots”, and “clinical”. Only full-text studies reported between 2017 and 2024 were included for analysis. Following PRISMA guidelines, a total of 126 studies were identified, of which 33 studies met the inclusion criteria. We found that 18% of studies have visualization techniques and tools for circular RNA-seq data, 56% for single-cell RNA-seq data, 23% for bulk RNA-seq data, and 3% for long non-coding RNA-seq data. Overall, this review provides a comprehensive overview of the common visualization tools and their potential applications, which is a useful resource for researchers and clinicians interested in using RNA-seq data for various clinical purposes (e.g., diagnosis or prognosis). 
    more » « less
    Free, publicly-accessible full text available January 12, 2026
  4. The global COVID-19 pandemic has strained healthcare systems and highlighted the need for accessible and efficient diagnostic methods. Traditional diagnostic tools, such as nasal swabs and biosensors, while accurate, pose significant logistical challenges and high costs, limiting their scalability. This paper explores an alternative, non-invasive approach to COVID-19 detection using machine learning algorithms to analyze vocal patterns, particularly cough and breathing sounds. Leveraging a publicly available dataset, we developed machine learning models capable of classifying audio samples as COVID-19 positive or negative. Our models achieve an AUC of up to 85% and an F1- score of 81%, demonstrating the potential of machine learning in enabling rapid, cost-effective COVID-19 diagnosis. These findings suggest that audio-based diagnostics could be a practical and scalable solution, particularly in resource-limited settings where traditional methods are less feasible. 
    more » « less
    Free, publicly-accessible full text available January 16, 2026
  5. As data grows exponentially across diverse fields, the ability to effectively leverage big data has become increasingly crucial. In the field of data science, however, minority groups, including African Americans, are significantly underrepresented. With the strategic role of minority-serving institutions to enhance diversity in the data science workforce and apply data science to health disparities, the National Institute for Minority Health Disparities (NIMHD) provided funding in September 2021 to six Research Centers in Minority Institutions (RCMI) to improve their data science capacity and foster collaborations with data scientists. Meharry Medical College (MMC), a historically Black College/University (HBCU), was among the six awardees. This paper summarizes the NIMHD-funded efforts at MMC, which include offering mini-grants to collaborative research groups, surveys to understand the needs of the community to guide project implementation, and data science training to enhance the data analytics skills of the RCMI investigators, staff, medical residents, and graduate students. This study is innovative as it addressed the urgent need to enhance the data science capacity of the RCMI program at MMC, build a diverse data science workforce, and develop collaborations between the RCMI and MMC’s newly established School of Applied Computational Science. This paper presents the progress of this NIMHD-funded project, which clearly shows its positive impact on the local community. 
    more » « less