Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.more » « lessFree, publicly-accessible full text available July 10, 2025
-
Free, publicly-accessible full text available April 27, 2025
-
Neural networks are an increasingly common tool for solving problems that require complex analysis and pattern matching, such as identifying stop signs in a self driving car or processing medical imagery during diagnosis. Accordingly, verification of neural networks for safety and correctness is of great importance, as mispredictions can have catastrophic results in safety critical domains. As neural networks are known to be sensitive to small changes in input, leading to vulnerabilities and adversarial attacks, analyzing the robustness of networks to small changes in input is a key piece of evaluating their safety and correctness. However, there are many real-world scenarios where the requirements of robustness are not clear cut, and it is crucial to develop measures that assess the level of robustness of a given neural network model and compare levels of robustness across different models, rather than using a binary characterization such as robust vs. not robust. We believe there is great need for developing scalable quantitative robustness verification techniques for neural networks. Formal verification techniques can provide guarantees of correctness, but most existing approaches do not provide quantitative robustness measures and are not effective in analyzing real-world network sizes. On the other hand, sampling-based quantitative robustness is not hindered much by the size of networks but cannot provide sound guarantees of quantitative results. We believe more research is needed to address the limitations of both symbolic and sampling-based verification approaches and create sound, scalable techniques for quantitative robustness verification of neural networks.more » « less
-
Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the performance of zkSNARK NNs is far from optimal due to the million-scale circuit computation with heavy scalar-level dependency. In this paper, we propose a type-based optimizing framework for efficient zero-knowledge NN inference, namely ZENO (ZEro knowledge Neural network Optimizer). We first introduce ZENO language construct to maintain high-level semantics and the type information (e.g., privacy and tensor) for allowing more aggressive optimizations. We then propose privacytype driven and tensor-type driven optimizations to further optimize the generated zkSNARK circuit. Finally, we design a set of NN-centric system optimizations to further accelerate zkSNARK NNs. Experimental results show that ZENO achieves up to 8.5× end-to-end speedup than state-of-the-art zkSNARK NNs. We reduce proof time for VGG16 from 6 minutes to 48 seconds, which makes zkSNARK NNs practical.more » « less
-
Neural networks are an increasingly common tool for solving problems that require complex analysis and pattern matching, such as identifying stop signs or processing medical imagery. Accordingly, verification of neural networks for safety and correctness is of great importance, as mispredictions can have catastrophic results in safety critical domains. One metric for verification is robustness, which answers whether or not a misclassified input exists in a given input neighborhood. I am focusing my research at quantitative robustness—finding not only if there exist misclassified inputs within a given neighborhood but also how many exist as a proportion of the neighborhood size. My overall goal is to expand the research on quantitative neural network robustness verification and create a variety of quantitative verification tools geared towards expanding our understanding of neural network robustness.more » « less
-
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG , a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41×, 4.81×, and 10.83× faster than DGL, MGG-UVM, and ROC, respectively.more » « less
-
Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate MGG with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70× speedup over the state-of-the-art DGL framework across various models and datasets.more » « less