Search for: All records

Creators/Authors contains: "Rahnavard, Ali"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

resLens: genomic language models to enhance antibiotic resistance gene detection

https://doi.org/10.1101/2025.07.08.663767

Mollerus, Matthew; Dittmar, Katharina; Crandall, Keith A; Rahnavard, Ali (July 2025, bioRxiv)

Abstract The pace of antibiotic resistance necessitates advanced tools to detect and analyze antibiotic resistance genes (ARGs). We presentresLens, a family of genomic language models (gLM) leveraging latent genomic representations for ARG detection and analysis. Unlike alignment-based methods constrained by reference databases,resLensfine-tunes pre-trained gLMs on curated ARG datasets, achieving superior performance across several evaluation scenarios, including when ARGs exhibit dissimilar sequences and mechanisms to those in reference databases.
more » « less
Free, publicly-accessible full text available July 11, 2026
Longitudinal Omics Data Analysis: A Review on Models, Algorithms, and Tools

Taheriyoun, Ali R; Ross, Allen; Safikhani, Abolfazl; Soudbakhsh, Damoon; Rahnavard, Ali (June 2025, arxiv)

Longitudinal omics data (LOD) analysis is essential for understanding the dynamics of biological processes and disease progression over time. This review explores various statistical and computational approaches for analyzing such data, emphasizing their applications and limitations. The main characteristics of longitudinal data, such as imbalancedness, high-dimensionality, and non-Gaussianity are discussed for modeling and hypothesis testing. We discuss the properties of linear mixed models (LMM) and generalized linear mixed models (GLMM) as foundation stones in LOD analyses and highlight their extensions to handle the obstacles in the frequentist and Bayesian frameworks. We differentiate in dynamic data analysis between time-course and longitudinal analyses, covering functional data analysis (FDA) and replication constraints. We explore classification techniques, single-cell as exemplary omics longitudinal studies, survival modeling, and multivariate methods for clinical/biomarker-based applications. Emerging topics, including data integration, clustering, and network-based modeling, are also discussed. We categorized the state-of-the-art approaches applicable to omics data, highlighting how they address the data features. This review serves as a guideline for researchers seeking robust strategies to analyze longitudinal omics data effectively, which is usually complex.
more » « less
Free, publicly-accessible full text available June 11, 2026
cellSight : Characterizing dynamics of cells using single-cell RNA-sequencing

https://doi.org/10.1101/2025.05.16.654572

Chatterjee, Ranojoy; Gohel, Chiraag; Shook, Brett A; Rahnavard, Ali (May 2025, bioRxiv)

SUMMARY Single-cell analysis has transformed our understanding of cellular diversity, offering insights into complex biological systems. Yet, manual data processing in single-cell studies poses challenges, including inefficiency, human error, and limited scalability. To address these issues, we propose the automated workflowcellSight, which integrates high-throughput sequencing in a user-friendly platform. By automating tasks like cell type clustering, feature extraction, and data normalization,cellSightreduces researcher workload, promoting focus on data interpretation and hypothesis generation. Its standardized analysis pipelines and quality control metrics enhance reproducibility, enabling collaboration across studies. Moreover,cellSight’s adaptability supports integration with emerging technologies, keeping pace with advancements in single-cell genomics.cellSightaccelerates discoveries in single-cell biology, driving impactful insights and clinical translation. It is available with documentation and tutorials athttps://github.com/omicsEye/cellSight.
more » « less
Free, publicly-accessible full text available May 22, 2026
Meta-analytic microbiome target discovery for immune checkpoint inhibitor response in advanced melanoma

https://doi.org/10.1101/2025.03.21.644637

Zhang, Xinyang; Mallick, Himel; Rahnavard, Ali (March 2025, bioRxiv)

Abstract Immune checkpoint inhibitors (ICIs) have revolutionized melanoma treatment, yet patient responses remain highly variable, underscoring the need for predictive biomarkers. Emerging evidence suggests that gut microbiome composition influences ICI efficacy, though findings remain inconsistent across studies. Here, we present a meta-analysis of seven melanoma-associated microbiome cohorts (N=678) using a standardized computational pipeline to integrate microbial species, biosynthetic gene clusters (BGCs), and functional pathways. We identifyFaecalibacteriumSGB15346 as a key species enriched in responders, alongside RiPP biosynthetic class and pathways involved in short-chain fatty acid fermentation. Conversely, dTDP-sugar biosynthesis correlates with non-response. Our results highlight microbial signatures and metabolic pathways associated with ICI outcomes, offering potential targets for microbiome-based interventions in personalized immunotherapy.
more » « less
Free, publicly-accessible full text available March 21, 2026
seqLens : optimizing language models for genomic predictions

https://doi.org/10.1101/2025.03.12.642848

Baghbanzadeh, Mahdi; Mann, Brendan; Crandall, Keith A; Rahnavard, Ali (March 2025, bioRxiv)

Understanding genomic sequences through the lens of language modeling has the potential to revolutionize biological research, yet challenges in tokenization, model architecture, and adaptation to diverse genomic contexts remain. In this study, we investigated key innovations in DNA sequence modeling, treating DNA as a language and applying language models to genomic data. We gathered two diverse pretraining datasets: one consisting of 19,551 reference genomes, including over 18,000 prokaryotic genomes (115B nucleotides), and another more balanced dataset with 1,354 genomes, including 1,166 prokaryotic and 188 eukaryotic reference genomes (180B nucleotides). We trained five byte-pair encoding tokenizers and pretrained 52 DNA language models, systematically comparing different architectures, hyperparameters, and classification heads. We introduceseqLens, a family of models based on disentangled attention with relative positional encoding, which outperforms state-of-the-art models in 13 of 19 benchmarking phenotypic predictions. We further explore continual pretraining, domain adaptation, and parameter-efficient fine-tuning methods to assess trade-offs between computational efficiency and accuracy. Our findings demonstrate that relevant pretraining data significantly boosts performance, alternative pooling techniques enhance classification, and larger tokenizers negatively impact generalization. These insights provide a foundation for optimizing DNA language models and improving genome annotations.
more » « less
Free, publicly-accessible full text available March 14, 2026
Analysis of metagenomic data

https://doi.org/10.1038/s43586-024-00376-6

Liu, Shaopeng; Rodriguez, Judith S; Munteanu, Viorel; Ronkowski, Cynthia; Sharma, Nitesh Kumar; Alser, Mohammed; Andreace, Francesco; Blekhman, Ran; Błaszczyk, Dagmara; Chikhi, Rayan; et al (December 2025, Nature Reviews Methods Primers)

Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability.
more » « less
Free, publicly-accessible full text available December 1, 2026
Discovering genotype–phenotype relationships with machine learning and the Visual Physiology Opsin Database ( VPOD )

https://doi.org/10.1093/gigascience/giae073

Frazer, Seth_A; Baghbanzadeh, Mahdi; Rahnavard, Ali; Crandall, Keith_A; Oakley, Todd_H (October 2024, GigaScience)

Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax—the wavelength of maximum absorbance—which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.
more » « less
Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD)

https://doi.org/10.1101/2024.02.12.579993

Frazer, Seth A; Baghbanzadeh, Mahdi; Rahnavard, Ali; Crandall, Keith A; Oakley, Todd H (February 2024, bioRxiv)

Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax - the wavelength of maximum absorbance - which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λ_maxphenotypes called the Visual Physiology Opsin Database (VPOD).VPOD_1.0contains 864 unique opsin genotypes and corresponding λ_maxphenotypes collected across all animals from 73 separate publications. We useVPODdata anddeepBreaksto show regression-based machine learning (ML) models often reliably predict λ_max, account for non-additive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly forde-novoprotein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. Key PointsWe introduce the Visual Physiology Opsin Database (VPOD_1.0), which includes 864 unique animal opsin genotypes and corresponding λ_maxphenotypes from 73 separate publications.We demonstrate that regression-based ML models can reliably predict λmax from gene sequence alone, predict non-additive effects of mutations on function, and identify functionally critical amino acid sites.We provide an approach that lays the groundwork for future robust exploration of molecular-evolutionary patterns governing phenotype, with potential broader applications to any family of genes with quantifiable and comparable phenotypes.
more » « less
Full Text Available
Hospital antimicrobial stewardship: profiling the oral microbiome after exposure to COVID-19 and antibiotics

https://doi.org/10.3389/fmicb.2024.1346762

Buendia, Patricia; Fernandez, Krystal; Raley, Castle; Rahnavard, Ali; Crandall, Keith A; Castro, Jose Guillermo (February 2024, Frontiers in Microbiology)

IntroductionDuring the COVID-19 Delta variant surge, the CLAIRE cross-sectional study sampled saliva from 120 hospitalized patients, 116 of whom had a positive COVID-19 PCR test. Patients received antibiotics upon admission due to possible secondary bacterial infections, with patients at risk of sepsis receiving broad-spectrum antibiotics (BSA). MethodsThe saliva samples were analyzed with shotgun DNA metagenomics and respiratory RNA virome sequencing. Medical records for the period of hospitalization were obtained for all patients. Once hospitalization outcomes were known, patients were classified based on their COVID-19 disease severity and the antibiotics they received. ResultsOur study reveals that BSA regimens differentially impacted the human salivary microbiome and disease progression. 12 patients died and all of them received BSA. Significant associations were found between the composition of the COVID-19 saliva microbiome and BSA use, between SARS-CoV-2 genome coverage and severity of disease. We also found significant associations between the non-bacterial microbiome and severity of disease, withCandida albicansdetected most frequently in critical patients. For patients who did not receive BSA before saliva sampling, our study suggestsStaphylococcus aureusas a potential risk factor for sepsis. DiscussionOur results indicate that the course of the infection may be explained by both monitoring antibiotic treatment and profiling a patient’s salivary microbiome, establishing a compelling link between microbiome and the specific antibiotic type and timing of treatment. This approach can aid with emergency room triage and inpatient management but also requires a better understanding of and access to narrow-spectrum agents that target pathogenic bacteria.
more » « less
Full Text Available
Molecular epidemiology of pregnancy using omics data: advances, success stories, and challenges

https://doi.org/10.1186/s12967-024-04876-7

Rahnavard, Ali; Chatterjee, Ranojoy; Wen, Hui; Gaylord, Clark; Mugusi, Sabina; Klatt, Kevin C.; Smith, Emily R. (January 2024, Journal of Translational Medicine)

Abstract Multi-omics approaches have been successfully applied to investigate pregnancy and health outcomes at a molecular and genetic level in several studies. As omics technologies advance, research areas are open to study further. Here we discuss overall trends and examples of successfully using omics technologies and techniques (e.g., genomics, proteomics, metabolomics, and metagenomics) to investigate the molecular epidemiology of pregnancy. In addition, we outline omics applications and study characteristics of pregnancy for understanding fundamental biology, causal health, and physiological relationships, risk and prediction modeling, diagnostics, and correlations.
more » « less

« Prev Next »