skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Crandall, Keith A"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Background Human endogenous retroviruses (HERVs) harbor accessory proteins that influence cellular processes and have been linked to a wide variety of diseases, including cancer. This study investigates locus-specific HERV expression and its association with gene dysregulation in hepatocellular carcinoma (HCC), a highly prevalent and deadly form of liver cancer worldwide. Methods We analyzed RNASeq data from 424 HCC samples from The Cancer Genome Atlas (TCGA), which comprised 371 tumor and 50 matched normal tissues from a total of 371 hepatocellular carcinoma participants. We employed Telescope to identify and quantify HERV expression across the total RNA sequencing data. Results The majority of differentially expressed HERVs exhibited reduced expression in tumor tissue (166 downregulated vs. 50 upregulated), suggesting a potential functional role of HERV expression patterns in shaping the pathophysiological landscape of HCC. Specifically, the suppression of HERV-H family members, which are known to regulate cellular differentiation, may contribute to tumor dedifferentiation, increased plasticity, and enhanced metastatic potential. This loss of differentiation control and increased adaptability may play a critical role in driving the progression of liver cancer. Discussion Our study highlights a significant association of HERV expression with HCC, highlighting the differential regulation of specific HERV families in tumor tissue. For example, HERVH and ERVLE families showed consistent downregulation in tumor samples, while HERVE and HERV9 were more commonly upregulated. These shifts may reflect underlying changes in transcriptional regulation or chromatin structure between normal and malignant tissues. Rather than indicating a singular functional role, the observed expression patterns likely reflect a multifaceted relationship between HERVs and tumor biology. Further studies will be needed to determine whether these expression differences contribute to, or result from, tumor progression and to explore their potential as biomarkers or therapeutic targets. 
    more » « less
  2. Abstract The pace of antibiotic resistance necessitates advanced tools to detect and analyze antibiotic resistance genes (ARGs). We presentresLens, a family of genomic language models (gLM) leveraging latent genomic representations for ARG detection and analysis. Unlike alignment-based methods constrained by reference databases,resLensfine-tunes pre-trained gLMs on curated ARG datasets, achieving superior performance across several evaluation scenarios, including when ARGs exhibit dissimilar sequences and mechanisms to those in reference databases. 
    more » « less
  3. Understanding genomic sequences through the lens of language modeling has the potential to revolutionize biological research, yet challenges in tokenization, model architecture, and adaptation to diverse genomic contexts remain. In this study, we investigated key innovations in DNA sequence modeling, treating DNA as a language and applying language models to genomic data. We gathered two diverse pretraining datasets: one consisting of 19,551 reference genomes, including over 18,000 prokaryotic genomes (115B nucleotides), and another more balanced dataset with 1,354 genomes, including 1,166 prokaryotic and 188 eukaryotic reference genomes (180B nucleotides). We trained five byte-pair encoding tokenizers and pretrained 52 DNA language models, systematically comparing different architectures, hyperparameters, and classification heads. We introduceseqLens, a family of models based on disentangled attention with relative positional encoding, which outperforms state-of-the-art models in 13 of 19 benchmarking phenotypic predictions. We further explore continual pretraining, domain adaptation, and parameter-efficient fine-tuning methods to assess trade-offs between computational efficiency and accuracy. Our findings demonstrate that relevant pretraining data significantly boosts performance, alternative pooling techniques enhance classification, and larger tokenizers negatively impact generalization. These insights provide a foundation for optimizing DNA language models and improving genome annotations. 
    more » « less
  4. Abstract Adapting language models to genomic and metagenomic sequences presents unique challenges, particularly in tokenization and task-specific generalization. Standard methods, such as fixed-length k-mers or byte pair encoding, often fail to preserve biologically meaningful patterns essential for downstream tasks. We introduceGuided Tokenization(GT), a strategy that prioritizes biologically and statistically important subsequences based on importance scores, model attention, and class distributions. Combined withdomain adaptation, which incorporates prior domain specific biological knowledge, this approach improves both representation quality and classification accuracy in compact genomic language models (gLMs). GT enhances biological awareness in genomic language models, particularly for effective small and mid-sized models across key tasks, including DNA sequence read classification, promoter detection, antimicrobial resistance classification, and targeted amplicon taxonomic profiling. Our results highlight the promise of guided tokenization and domain-aware modeling for building efficient, biologically grounded language models for scalable genomic applications. 
    more » « less
  5. Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability. 
    more » « less
  6. Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax - the wavelength of maximum absorbance - which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmaxphenotypes called the Visual Physiology Opsin Database (VPOD).VPOD_1.0contains 864 unique opsin genotypes and corresponding λmaxphenotypes collected across all animals from 73 separate publications. We useVPODdata anddeepBreaksto show regression-based machine learning (ML) models often reliably predict λmax, account for non-additive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly forde-novoprotein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. Key PointsWe introduce the Visual Physiology Opsin Database (VPOD_1.0), which includes 864 unique animal opsin genotypes and corresponding λmaxphenotypes from 73 separate publications.We demonstrate that regression-based ML models can reliably predict λmax from gene sequence alone, predict non-additive effects of mutations on function, and identify functionally critical amino acid sites.We provide an approach that lays the groundwork for future robust exploration of molecular-evolutionary patterns governing phenotype, with potential broader applications to any family of genes with quantifiable and comparable phenotypes. 
    more » « less
  7. IntroductionDuring the COVID-19 Delta variant surge, the CLAIRE cross-sectional study sampled saliva from 120 hospitalized patients, 116 of whom had a positive COVID-19 PCR test. Patients received antibiotics upon admission due to possible secondary bacterial infections, with patients at risk of sepsis receiving broad-spectrum antibiotics (BSA). MethodsThe saliva samples were analyzed with shotgun DNA metagenomics and respiratory RNA virome sequencing. Medical records for the period of hospitalization were obtained for all patients. Once hospitalization outcomes were known, patients were classified based on their COVID-19 disease severity and the antibiotics they received. ResultsOur study reveals that BSA regimens differentially impacted the human salivary microbiome and disease progression. 12 patients died and all of them received BSA. Significant associations were found between the composition of the COVID-19 saliva microbiome and BSA use, between SARS-CoV-2 genome coverage and severity of disease. We also found significant associations between the non-bacterial microbiome and severity of disease, withCandida albicansdetected most frequently in critical patients. For patients who did not receive BSA before saliva sampling, our study suggestsStaphylococcus aureusas a potential risk factor for sepsis. DiscussionOur results indicate that the course of the infection may be explained by both monitoring antibiotic treatment and profiling a patient’s salivary microbiome, establishing a compelling link between microbiome and the specific antibiotic type and timing of treatment. This approach can aid with emergency room triage and inpatient management but also requires a better understanding of and access to narrow-spectrum agents that target pathogenic bacteria. 
    more » « less
  8. Abstract BackgroundPredicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax—the wavelength of maximum absorbance—which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. ResultsHere, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites. ConclusionThe ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism’s ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes. 
    more » « less
  9. Macrophage-lineage cells are indispensable to immunity and physiology of all vertebrates. Amongst these, amphibians represent a key stage in vertebrate evolution and are facing decimating population declines and extinctions, in large part due to emerging infectious agents. While recent studies indicate that macrophages and related innate immune cells are critically involved during these infections, much remains unknown regarding the ontogeny and functional differentiation of these cell types in amphibians. Accordingly, in this review we coalesce what has been established to date about amphibian blood cell development (hematopoiesis), the development of key amphibian innate immune cells (myelopoiesis) and the differentiation of amphibian macrophage subsets (monopoiesis). We explore the current understanding of designated sites of larval and adult hematopoiesis across distinct amphibian species and consider what mechanisms may lend to these species-specific adaptations. We discern the identified molecular mechanisms governing the functional differentiation of disparate amphibian (chiefly Xenopus laevis) macrophage subsets and describe what is known about the roles of these subsets during amphibian infections with intracellular pathogens. Macrophage lineage cells are at the heart of so many vertebrate physiological processes. Thus, garnering greater understanding of the mechanisms responsible for the ontogeny and functionality of these cells in amphibians will lend to a more comprehensive view of vertebrate evolution. 
    more » « less