NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Black-Box Multi-Robustness Testing for Neural Networks

https://doi.org/10.1109/ICSTW64639.2025.10962491

Downing, Mara; Bultan, Tevfik (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
Hybrid Equivalence/Non-Equivalence Testing

https://doi.org/10.1109/ICST62969.2025.10988990

Sarker, Laboni; Bultan, Tevfik (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model

Wang, Zheng; Wang, Yuke; Feng, Boyuan; Huang, Guyue; Mudigere, Dheevatsa; Muthiah, Bharath; Li, Ang; Ding, Yufei (July 2024, USENIX Association)

The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.
more » « less
Full Text Available
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing

https://doi.org/10.1145/3620665.3640406

Wang, Zheng; Wang, Yuke; Deng, Jiaqi; Zheng, Da; Li, Ang; Ding, Yufei (April 2024, ACM)

Full Text Available
The Case for Scalable Quantitative Neural Network Analysis

https://doi.org/10.1145/3617574.3617862

Downing, Mara; Bultan, Tevfik (December 2023, ACM)

Neural networks are an increasingly common tool for solving problems that require complex analysis and pattern matching, such as identifying stop signs in a self driving car or processing medical imagery during diagnosis. Accordingly, verification of neural networks for safety and correctness is of great importance, as mispredictions can have catastrophic results in safety critical domains. As neural networks are known to be sensitive to small changes in input, leading to vulnerabilities and adversarial attacks, analyzing the robustness of networks to small changes in input is a key piece of evaluating their safety and correctness. However, there are many real-world scenarios where the requirements of robustness are not clear cut, and it is crucial to develop measures that assess the level of robustness of a given neural network model and compare levels of robustness across different models, rather than using a binary characterization such as robust vs. not robust. We believe there is great need for developing scalable quantitative robustness verification techniques for neural networks. Formal verification techniques can provide guarantees of correctness, but most existing approaches do not provide quantitative robustness measures and are not effective in analyzing real-world network sizes. On the other hand, sampling-based quantitative robustness is not hindered much by the size of networks but cannot provide sound guarantees of quantitative results. We believe more research is needed to address the limitations of both symbolic and sampling-based verification approaches and create sound, scalable techniques for quantitative robustness verification of neural networks.
more » « less
Full Text Available
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference

Feng, Boyuan; Wang, Zheng; Wang, Yuke; Yang, Shu; Ding, Yufei. (October 2023, ACM)

Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the performance of zkSNARK NNs is far from optimal due to the million-scale circuit computation with heavy scalar-level dependency. In this paper, we propose a type-based optimizing framework for efficient zero-knowledge NN inference, namely ZENO (ZEro knowledge Neural network Optimizer). We first introduce ZENO language construct to maintain high-level semantics and the type information (e.g., privacy and tensor) for allowing more aggressive optimizations. We then propose privacytype driven and tensor-type driven optimizations to further optimize the generated zkSNARK circuit. Finally, we design a set of NN-centric system optimizations to further accelerate zkSNARK NNs. Experimental results show that ZENO achieves up to 8.5× end-to-end speedup than state-of-the-art zkSNARK NNs. We reduce proof time for VGG16 from 6 minutes to 48 seconds, which makes zkSNARK NNs practical.
more » « less
Full Text Available
Quantitative Robustness Analysis of Neural Networks

https://doi.org/10.1145/3597926.3605231

Downing, Mara (July 2023, ACM)

Neural networks are an increasingly common tool for solving problems that require complex analysis and pattern matching, such as identifying stop signs or processing medical imagery. Accordingly, verification of neural networks for safety and correctness is of great importance, as mispredictions can have catastrophic results in safety critical domains. One metric for verification is robustness, which answers whether or not a misclassified input exists in a given input neighborhood. I am focusing my research at quantitative robustness—finding not only if there exist misclassified inputs within a given neighborhood but also how many exist as a proportion of the neighborhood size. My overall goal is to expand the research on quantitative neural network robustness verification and create a variety of quantitative verification tools geared towards expanding our understanding of neural network robustness.
more » « less
Full Text Available
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Wang, Yuke; Feng, Boyuan; Wang, Zheng; Huang, Guyue; Ding, Yufei. (July 2023, USENIX Association)

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate MGG with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70× speedup over the state-of-the-art DGL framework across various models and datasets.
more » « less
Full Text Available
{MGG}: Accelerating Graph Neural Networks with {Fine-Grained} {Intra-Kernel} {Communication-Computation} Pipelining on {Multi-GPU} Platforms

Wang, Yuke; Feng, Boyuan; Wang, Zheng; Barker, Kevin; Li, Ang; Ding, Yufei. (July 2023, USENIX Association)

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG , a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41×, 4.81×, and 10.83× faster than DGL, MGG-UVM, and ROC, respectively.
more » « less
Full Text Available
Faith: An Efficient Framework for Transformer Verification on GPUs

Feng, Boyuan; Tang, Tianqi; Wang, Yuke; Chen, Zhaodong; Wang, Zheng; Yang, Shu; Xie, Yuan; Ding, Yufei (July 2022, Proceedings of the 2022 USENIX Annual Technical Conference)

Full Text Available

« Prev Next »

Search for: All records