skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2109688

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract The pace of antibiotic resistance necessitates advanced tools to detect and analyze antibiotic resistance genes (ARGs). We presentresLens, a family of genomic language models (gLM) leveraging latent genomic representations for ARG detection and analysis. Unlike alignment-based methods constrained by reference databases,resLensfine-tunes pre-trained gLMs on curated ARG datasets, achieving superior performance across several evaluation scenarios, including when ARGs exhibit dissimilar sequences and mechanisms to those in reference databases. 
    more » « less
    Free, publicly-accessible full text available July 11, 2026
  2. SUMMARY Single-cell analysis has transformed our understanding of cellular diversity, offering insights into complex biological systems. Yet, manual data processing in single-cell studies poses challenges, including inefficiency, human error, and limited scalability. To address these issues, we propose the automated workflowcellSight, which integrates high-throughput sequencing in a user-friendly platform. By automating tasks like cell type clustering, feature extraction, and data normalization,cellSightreduces researcher workload, promoting focus on data interpretation and hypothesis generation. Its standardized analysis pipelines and quality control metrics enhance reproducibility, enabling collaboration across studies. Moreover,cellSight’s adaptability supports integration with emerging technologies, keeping pace with advancements in single-cell genomics.cellSightaccelerates discoveries in single-cell biology, driving impactful insights and clinical translation. It is available with documentation and tutorials athttps://github.com/omicsEye/cellSight. 
    more » « less
    Free, publicly-accessible full text available May 22, 2026
  3. Abstract Immune checkpoint inhibitors (ICIs) have revolutionized melanoma treatment, yet patient responses remain highly variable, underscoring the need for predictive biomarkers. Emerging evidence suggests that gut microbiome composition influences ICI efficacy, though findings remain inconsistent across studies. Here, we present a meta-analysis of seven melanoma-associated microbiome cohorts (N=678) using a standardized computational pipeline to integrate microbial species, biosynthetic gene clusters (BGCs), and functional pathways. We identifyFaecalibacteriumSGB15346 as a key species enriched in responders, alongside RiPP biosynthetic class and pathways involved in short-chain fatty acid fermentation. Conversely, dTDP-sugar biosynthesis correlates with non-response. Our results highlight microbial signatures and metabolic pathways associated with ICI outcomes, offering potential targets for microbiome-based interventions in personalized immunotherapy. 
    more » « less
    Free, publicly-accessible full text available March 21, 2026
  4. Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax—the wavelength of maximum absorbance—which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. 
    more » « less
  5. IntroductionDuring the COVID-19 Delta variant surge, the CLAIRE cross-sectional study sampled saliva from 120 hospitalized patients, 116 of whom had a positive COVID-19 PCR test. Patients received antibiotics upon admission due to possible secondary bacterial infections, with patients at risk of sepsis receiving broad-spectrum antibiotics (BSA). MethodsThe saliva samples were analyzed with shotgun DNA metagenomics and respiratory RNA virome sequencing. Medical records for the period of hospitalization were obtained for all patients. Once hospitalization outcomes were known, patients were classified based on their COVID-19 disease severity and the antibiotics they received. ResultsOur study reveals that BSA regimens differentially impacted the human salivary microbiome and disease progression. 12 patients died and all of them received BSA. Significant associations were found between the composition of the COVID-19 saliva microbiome and BSA use, between SARS-CoV-2 genome coverage and severity of disease. We also found significant associations between the non-bacterial microbiome and severity of disease, withCandida albicansdetected most frequently in critical patients. For patients who did not receive BSA before saliva sampling, our study suggestsStaphylococcus aureusas a potential risk factor for sepsis. DiscussionOur results indicate that the course of the infection may be explained by both monitoring antibiotic treatment and profiling a patient’s salivary microbiome, establishing a compelling link between microbiome and the specific antibiotic type and timing of treatment. This approach can aid with emergency room triage and inpatient management but also requires a better understanding of and access to narrow-spectrum agents that target pathogenic bacteria. 
    more » « less
  6. Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax - the wavelength of maximum absorbance - which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmaxphenotypes called the Visual Physiology Opsin Database (VPOD).VPOD_1.0contains 864 unique opsin genotypes and corresponding λmaxphenotypes collected across all animals from 73 separate publications. We useVPODdata anddeepBreaksto show regression-based machine learning (ML) models often reliably predict λmax, account for non-additive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly forde-novoprotein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. Key PointsWe introduce the Visual Physiology Opsin Database (VPOD_1.0), which includes 864 unique animal opsin genotypes and corresponding λmaxphenotypes from 73 separate publications.We demonstrate that regression-based ML models can reliably predict λmax from gene sequence alone, predict non-additive effects of mutations on function, and identify functionally critical amino acid sites.We provide an approach that lays the groundwork for future robust exploration of molecular-evolutionary patterns governing phenotype, with potential broader applications to any family of genes with quantifiable and comparable phenotypes. 
    more » « less
  7. Abstract Multi-omics approaches have been successfully applied to investigate pregnancy and health outcomes at a molecular and genetic level in several studies. As omics technologies advance, research areas are open to study further. Here we discuss overall trends and examples of successfully using omics technologies and techniques (e.g., genomics, proteomics, metabolomics, and metagenomics) to investigate the molecular epidemiology of pregnancy. In addition, we outline omics applications and study characteristics of pregnancy for understanding fundamental biology, causal health, and physiological relationships, risk and prediction modeling, diagnostics, and correlations. 
    more » « less
  8. Abstract Proteins are direct products of the genome and metabolites are functional products of interactions between the host and other factors such as environment, disease state, clinical information, etc. Omics data, including proteins and metabolites, are useful in characterizing biological processes underlying COVID-19 along with patient data and clinical information, yet few methods are available to effectively analyze such diverse and unstructured data. Using an integrated approach that combines proteomics and metabolomics data, we investigated the changes in metabolites and proteins in relation to patient characteristics (e.g., age, gender, and health outcome) and clinical information (e.g., metabolic panel and complete blood count test results). We found significant enrichment of biological indicators of lung, liver, and gastrointestinal dysfunction associated with disease severity using publicly available metabolite and protein profiles. Our analyses specifically identified enriched proteins that play a critical role in responses to injury or infection within these anatomical sites, but may contribute to excessive systemic inflammation within the context of COVID-19. Furthermore, we have used this information in conjunction with machine learning algorithms to predict the health status of patients presenting symptoms of COVID-19. This work provides a roadmap for understanding the biochemical pathways and molecular mechanisms that drive disease severity, progression, and treatment of COVID-19. 
    more » « less
  9. Abstract SARS-CoV-2 (CoV) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. We divided the CoV genome into 29 constituent regions and applied novel analytical approaches to identify associations between CoV genomic features and epidemiological metadata. Our results show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation. S protein variation is correlated with nsp3, nsp6, and 3′-to-5′ exonuclease variation. Country of origin and time since the start of the pandemic were the most influential metadata associated with genomic variation, while host sex and age were the least influential. We define a novel statistic—coherence—and show its utility in identifying geographic regions (populations) with unusually high (many new variants) or low (isolated) viral phylogenetic diversity. Interestingly, at both global and regional scales, we identify geographic locations with high coherence neighboring regions of low coherence; this emphasizes the utility of this metric to inform public health measures for disease spread. Our results provide a direction to prioritize genes associated with outcome predictors (e.g., health, therapeutic, and vaccine outcomes) and to improve DNA tests for predicting disease status. 
    more » « less
  10. Abstract The performance of computational methods and software to identify differentially expressed features in single‐cell RNA‐sequencing (scRNA‐seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA‐seq expression features. To model the technological variability in cross‐platform scRNA‐seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA‐seq expression profiles across experimental platforms induced by platform‐ and gene‐specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero‐inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero‐inflated scRNA‐seq data with excessive zero counts. Using both synthetic and published plate‐ and droplet‐based scRNA‐seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state‐of‐the‐art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open‐source software (R/Bioconductor package) is available athttps://github.com/himelmallick/Tweedieverse. 
    more » « less