NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Accelerating large scale de novo metagenome assembly using GPUs

https://doi.org/10.1145/3458817.3476212

Awan, Muaaz Gul; Hofmeyr, Steven; Egan, Rob; Ding, Nan; Buluc, Aydin; Deslippe, Jack; Oliker, Leonid; Yelick, Katherine (November 2021, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’21))

Full Text Available
The parallelism motifs of genomic data analysis

https://doi.org/10.1098/rsta.2019.0394

Yelick, Katherine; Buluç, Aydın; Awan, Muaaz; Azad, Ariful; Brock, Benjamin; Egan, Rob; Ekanayake, Saliya; Ellis, Marquita; Georganas, Evangelos; Guidi, Giulia; et al (March 2020, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences)

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
more » « less
Full Text Available
Critical Assessment of Metagenome Interpretation: the second round of challenges

https://doi.org/10.1038/s41592-022-01431-4

Meyer, Fernando; Fritz, Adrian; Deng, Zhi-Luo; Koslicki, David; Lesker, Till Robin; Gurevich, Alexey; Robertson, Gary; Alser, Mohammed; Antipov, Dmitry; Beghini, Francesco; et al (April 2022, Nature Methods)

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
more » « less
Full Text Available

Search for: All records