NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond

https://doi.org/10.7717/peerj.19171

Yoo, Hyunwoo; Refahi, Mohammadsaleh; Polikar, Robi; Sokhansanj, Bahrad A; Brown, James R; Rosen, Gail L (April 2025, PeerJ)

BackgroundThe advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. MethodsOne recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. ResultsiMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. ConclusionThe incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (https://github.com/EESI/Incremental-Protein-Search; DOI:10.5281/zenodo.14675319).
more » « less
Free, publicly-accessible full text available April 28, 2026
The Naïve Bayes classifier++ for metagenomic taxonomic classification—query evaluation

https://doi.org/10.1093/bioinformatics/btae743

Duan, Haozhe_Neil; Hearne, Gavin; Polikar, Robi; Rosen, Gail_L; Kendziorski, ed., Christina (December 2024, Bioinformatics)

Abstract MotivationThis study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. ResultsNBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information. Availability and implementationSource code and Dockerfile are available at http://github.com/EESI/Naive_Bayes.
more » « less
MetaMutationalSigs: comparison of mutational signature refitting results made easy

https://doi.org/10.1093/bioinformatics/btac091

Pandey, Palash; Arora, Sanjeevani; Rosen, Gail L.; Marschall, ed., Tobias (February 2022, Bioinformatics)

Abstract MotivationThe analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the non-trivial differences in tools’ refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types. ResultsDue to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages. Availability and implementationMetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.
more » « less
Physiological and evolutionary contexts of a new symbiotic species from the nitrogen-recycling gut community of turtle ants

https://doi.org/10.1038/s41396-023-01490-1

Béchade, Benoît; Cabuslay, Christian_S; Hu, Yi; Mendonca, Caroll_M; Hassanpour, Bahareh; Lin, Jonathan_Y; Su, Yangzhou; Fiers, Valerie_J; Anandarajan, Dharman; Lu, Richard; et al (August 2023, The ISME Journal)

Abstract While genome sequencing has expanded our knowledge of symbiosis, role assignment within multi-species microbiomes remains challenging due to genomic redundancy and the uncertainties of in vivo impacts. We address such questions, here, for a specialized nitrogen (N) recycling microbiome of turtle ants, describing a new genus and species of gut symbiont—Ischyrobacter davidsoniae (Betaproteobacteria: Burkholderiales: Alcaligenaceae)—and its in vivo physiological context. A re-analysis of amplicon sequencing data, with precisely assigned Ischyrobacter reads, revealed a seemingly ubiquitous distribution across the turtle ant genus Cephalotes, suggesting ≥50 million years since domestication. Through new genome sequencing, we also show that divergent I. davidsoniae lineages are conserved in their uricolytic and urea-generating capacities. With phylogenetically refined definitions of Ischyrobacter and separately domesticated Burkholderiales symbionts, our FISH microscopy revealed a distinct niche for I. davidsoniae, with dense populations at the anterior ileum. Being positioned at the site of host N-waste delivery, in vivo metatranscriptomics and metabolomics further implicate I. davidsoniae within a symbiont-autonomous N-recycling pathway. While encoding much of this pathway, I. davidsoniae expressed only a subset of the requisite steps in mature adult workers, including the penultimate step deriving urea from allantoate. The remaining steps were expressed by other specialized gut symbionts. Collectively, this assemblage converts inosine, made from midgut symbionts, into urea and ammonia in the hindgut. With urea supporting host amino acid budgets and cuticle synthesis, and with the ancient nature of other active N-recyclers discovered here, I. davidsoniae emerges as a central player in a conserved and impactful, multipartite symbiosis.
more » « less
Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization

https://doi.org/10.1038/s42003-025-07902-6

Refahi, Mohammadsaleh; Sokhansanj, Bahrad_A; Mell, Joshua_C; Brown, James_R; Yoo, Hyunwoo; Hearne, Gavin; Rosen, Gail_L (March 2025, Communications Biology)
Can Large Language Models Classify and Generate Antimicrobial Resistance Genes?

https://doi.org/10.18653/v1/2025.bionlp-1.21

Yoo, Hyunwoo; Shin, Haebin; Rosen, Gail (January 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available January 1, 2026
Normalized Compression Distance for DNA Classification

https://doi.org/10.1145/3698587.3701490

Hearne, Gavin LA; Refahi, Mohammad S; Duan, Haozhe Neil; Brown, James R; Rosen, Gail L (November 2024, ACM)

Free, publicly-accessible full text available November 22, 2025
The Role and Applications of Artificial Intelligence in the Treatment of Chronic Pain

https://doi.org/10.1007/s11916-024-01264-0

Meier, Tiffany A; Refahi, Mohammad S; Hearne, Gavin; Restifo, Daniele S; Munoz-Acuna, Ricardo; Rosen, Gail L; Woloszynek, Stephen (August 2024, Current Pain and Headache Reports)

Full Text Available
Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening

https://doi.org/10.1021/acs.jcim.4c00234

Chandraghatgi, Rohan; Ji, Hai-Feng; Rosen, Gail_L; Sokhansanj, Bahrad_A (May 2024, Journal of Chemical Information and Modeling)
Fragment databases from screened ligands for drug discovery (FDSL-DD)

https://doi.org/10.1016/j.jmgm.2023.108669

Wilson, Jerica; Sokhansanj, Bahrad A.; Chong, Wei Chuen; Chandraghatgi, Rohan; Rosen, Gail L.; Ji, Hai-Feng (March 2024, Journal of Molecular Graphics and Modelling)

Full Text Available

« Prev Next »

Search for: All records