skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2212508

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability. 
    more » « less
    Free, publicly-accessible full text available December 1, 2026
  2. Abstract The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome—millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses, is nonetheless a rich resource for machine learning (ML) approaches as alternatives for extracting such important information from these data. It is of hence utmost importance to design a framework for testing and benchmarking the robustness of these ML models. This paper makes the first effort (to our knowledge) to benchmark the robustness of ML models by simulating biological sequences with errors. In this paper, we introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio. We show from experiments on a wide array of ML models that some simulation-based approaches with different perturbation budgets are more robust (and accurate) than others for specific embedding methods to certain noise simulations on the input sequences. Our benchmarking framework may assist researchers in properly assessing different ML models and help them understand the behavior of the SARS-CoV-2 virus or avoid possible future pandemics. 
    more » « less
  3. The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding the evolution and transmission of the virus, the high error rate associated with these reads can lead to inadequate genome assembly and downstream biological interpretation. In this study, we evaluate the accuracy and robustness of machine learning (ML) models using six different embedding techniques on SARS-CoV-2 error-incorporated genome sequences. Our analysis includes two types of error-incorporated genome sequences: those generated using simulation tools to emulate error profiles of long-read sequencing platforms and those generated by introducing random errors. We show that the spaced k-mers embedding method achieves high accuracy in classifying error-free SARS-CoV-2 genome sequences, and the spaced k-mers and weighted k-mers embedding methods are highly accurate in predicting error-incorporated sequences. The fixed-length vectors generated by these methods contribute to the high accuracy achieved. Our study provides valuable insights for researchers to effectively evaluate ML models and gain a better understanding of the approach for accurate identification of critical SARS-CoV-2 genome sequences. 
    more » « less
  4. Lyme disease (LD), the most prevalent tick-borne disease of humans in the Northern Hemisphere, is caused by the spirochetal bacterium of Borreliella burgdorferi ( Bb ) sensu lato complex. In nature, Bb spirochetes are continuously transmitted between Ixodes ticks and mammalian or avian reservoir hosts. Peromyscus leucopus mice are considered the primary mammalian reservoir of Bb in the United States. Earlier studies demonstrated that experimentally infected P. leucopus mice do not develop disease. In contrast, C3H mice, a widely used laboratory strain of Mus musculus in the LD field, develop severe Lyme arthritis. To date, the exact tolerance mechanism of P. leucopus mice to Bb -induced infection remains unknown. To address this knowledge gap, the present study has compared spleen transcriptomes of P. leucopus and C3H/HeJ mice infected with Bb strain 297 with those of their respective uninfected controls. Overall, the data showed that the spleen transcriptome of Bb -infected P. leucopus mice was much more quiescent compared to that of the infected C3H mice. To date, the current investigation is one of the few that have examined the transcriptome response of natural reservoir hosts to Borreliella infection. Although the experimental design of this study significantly differed from those of two previous investigations, the collective results of the current and published studies have consistently demonstrated very limited transcriptomic responses of different reservoir hosts to the persistent infection of LD pathogens. Importance The bacterium Borreliella burgdorferi ( Bb ) causes Lyme disease, which is one of the emerging and highly debilitating human diseases in countries of the Northern Hemisphere. In nature, Bb spirochetes are maintained between hard ticks of Ixodes spp. and mammals or birds. In the United States, the white-footed mouse, Peromyscus leucopus , is one of the main Bb reservoirs. In contrast to humans and laboratory mice (e.g., C3H mice), white-footed mice rarely develop clinical signs (disease) despite being (persistently) infected with Bb . How the white-footed mouse tolerates Bb infection is the question that the present study has attempted to address. Comparisons of genetic responses between Bb -infected and uninfected mice demonstrated that, during a long-term Bb infection, C3H mice reacted much stronger, whereas P. leucopus mice were relatively unresponsive. 
    more » « less