NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Compiling Recurrences over Dense and Sparse Arrays

https://doi.org/10.1145/3649820

Sundram, Shiv; Tariq, Muhammad Usman; Kjolstad, Fredrik (April 2024, Proceedings of the ACM on Programming Languages)

We present a framework for compiling recurrence equations into native code. In our framework, users specify a system of recurrences, the types of data structures that store inputs and outputs, and scheduling commands for optimization. Our compiler then lowers these specifications into native code that respects the dependencies in the recurrence equations. Our compiler can generate code over both sparse and dense data structures, and determines if the recurrence system is solvable with the provided scheduling primitives. We evaluate the performance and correctness of the generated code on several recurrences, from domains as diverse as dense and sparse matrix solvers, dynamic programming, graph problems, and sparse tensor algebra. We demonstrate that the generated code has competitive performance to hand-optimized implementations in libraries. However, these handwritten libraries target specific recurrences, specific data structures, and specific optimizations. Our system, on the other hand, automatically generates implementations from recurrences, data formats, and schedules, giving our system more generality than library approaches.
more » « less
Full Text Available
Molecular level characterization of DOM along a freshwater-to-estuarine coastal gradient in the Florida Everglades

https://doi.org/10.1007/s00027-022-00887-y

Leyva, Dennys; Jaffé, Rudolf; Courson, Jessica; Kominoski, John S.; Tariq, Muhammad Usman; Saeed, Fahad; Fernandez-Lima, Francisco (October 2022, Aquatic Sciences)

Full Text Available
SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions

https://doi.org/10.1371/journal.pone.0259349

Tariq, Muhammad Usman; Saeed, Fahad (October 2021, PLOS ONE)
Lisacek, Frederique (Ed.)
Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate , which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/ .
more » « less
Full Text Available
Graph Theoretic Approach for the Analysis of Comprehensive Mass-Spectrometry (MS/MS) Data of Dissolved Organic Matter

https://doi.org/10.1109/BIBM52615.2021.9669289

Tariq, Muhammad Usman; Leyvay, Dennys; Limaz, Francisco Alberto; Saeed, Fahad (December 2021, EEE International Conference on Bioinformatics and Biomedicine (BIBM),)

Full Text Available
Unsupervised Structural Classification of Dissolved Organic Matter Based on Fragmentation Pathways

https://doi.org/10.1021/acs.est.1c04726

Leyva, Dennys; Tariq, Muhammad Usman; Jaffé, Rudolf; Saeed, Fahad; Lima, Francisco Fernandez (January 2022, Environmental Science & Technology)

Full Text Available
An Optimized IoT-enabled Big Data Analytics Architecture for Edge-Cloud Computing

https://doi.org/10.1109/JIOT.2022.3157552

Babar, Muhammad; Jan, Mian Ahmad; He, Xiangjian; Tariq, Muhammad Usman; Mastorakis, Spyridon; Alturki, Ryan (January 2022, IEEE Internet of Things Journal)

Full Text Available
Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey

https://doi.org/10.1109/ACCESS.2020.3047588

Tariq, Muhammad Usman; Haseeb, Muhammad; Aledhari, Mohammed; Razzak, Rehma; Parizi, Reza M.; Saeed, Fahad (January 2021, IEEE Access)
Parallel Sampling-Pipeline for Indefinite Stream of Heterogeneous Graphs using OpenCL for FPGAs

https://doi.org/10.1109/BigData.2018.8621979

Tariq, Muhammad Usman; Saeed, Fahad (December 2018, IEEE International Conference on Big Data (Big Data))

In the field of data science, a huge amount of data, generally represented as graphs, needs to be processed and analyzed. It is of utmost importance that this data be processed swiftly and efficiently to save time and energy. The volume and velocity of data, along with irregular access patterns in graph data structures, pose challenges in terms of analysis and processing. Further, a big chunk of time and energy is spent on analyzing these graphs on large compute clusters and/or data-centers. Filtering and refining of data using graph sampling techniques are one of the most effective ways to speed up the analysis. Efficient accelerators, such as FPGAs, have proven to significantly lower the energy cost of running an algorithm. To this end, we present the design and implementation of a parallel graph sampling technique, for a large number of input graphs streaming into a FPGA. A parallel approach using OpenCL for FPGAs was adopted to come up with a solution that is both time- and energy-efficient. We introduce a novel graph data structure, suitable for streaming graphs on FPGAs, that allows time- and memory-efficient representation of graphs. Our experiments show that our proposed technique is 3x faster and 2x more energy efficient as compared to serial CPU version of the algorithm.
more » « less
Full Text Available

Search for: All records