Search for: All records

Creators/Authors contains: "Wang, Zheng"

« Prev Next »

Total Resources

76

Resource Type
Conference Paper

20

Conference Proceeding

8

Dataset

0

Journal Article

48

Workshop Report

0

Availability
Full Text / Resource Available

69

Citation Only

7

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting

Yu, Zhongzhi ; Wang, Zheng ; Li, Yuhan ; Gao, Ruijie ; Zhou, Xiaoya ; Bommu, Sreenidhi Reddy ; Zhao, Yang Katie ; Lin, Yingyan Celine ( June 2024 , ACM)
Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor

Fang, Shikai ; Yu, Xin ; Wang, Zheng ; Li, Shibo ; Kirby, Robert ; Zhe, Shandian ( March 2024 , Proceedings of The International Conference on Learning Representations (ICLR))
Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor

Fang, Shikai ; Yu, Xin ; Wang, Zheng ; Li, Shibo ; Kirby, Robert M. ; Zhe, Shandian ( March 2024 , Proceedings of The International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available March 15, 2025
Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

Wang, Zheng ; Fang, Shikai ; Li, Shibo ; Zhe, Shandian ( December 2023 , Proceedings of The 37th Conference on Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 20, 2024
Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

Wang, Zheng ; Fang, Shikai ; Li, Shibo ; Zhe, Shandian. ( December 2023 , Proceedings of The 37th Conference on Neural Information Processing Systems (NeurIPS))
Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

Fang, Shikai ; Yu, Xin ; Li, Shibo ; Wang Zheng ; Kirby, Robert M. ; Zhe, Shandian ( December 2023 , Proceedings of The 37th Conference on Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 20, 2024
Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

Fang, Shikai ; Yu, Xin ; Li, Shibo ; Wang, Zheng ; Kirby, Robert ; Zhe, Shandian ( December 2023 , The 37th Conference on Neural Information Processing Systems (NeurIPS),)
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference

Feng, Boyuan ; Wang, Zheng ; Wang, Yuke ; Yang, Shu ; Ding, Yufei. ( October 2023 , ACM)

Zero knowledge Neural Networks draw increasing attention for guaranteeing computation integrity and privacy of neural networks (NNs) based on zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK) security scheme. However, the performance of zkSNARK NNs is far from optimal due to the million-scale circuit computation with heavy scalar-level dependency. In this paper, we propose a type-based optimizing framework for efficient zero-knowledge NN inference, namely ZENO (ZEro knowledge Neural network Optimizer). We first introduce ZENO language construct to maintain high-level semantics and the type information (e.g., privacy and tensor) for allowing more aggressive optimizations. We then propose privacytype driven and tensor-type driven optimizations to further optimize the generated zkSNARK circuit. Finally, we design a set of NN-centric system optimizations to further accelerate zkSNARK NNs. Experimental results show that ZENO achieves up to 8.5× end-to-end speedup than state-of-the-art zkSNARK NNs. We reduce proof time for VGG16 from 6 minutes to 48 seconds, which makes zkSNARK NNs practical.
more » « less
Free, publicly-accessible full text available October 1, 2024
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Wang, Yuke ; Feng, Boyuan ; Wang, Zheng ; Huang, Guyue ; Ding, Yufei. ( July 2023 , USENIX Association)

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate MGG with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70× speedup over the state-of-the-art DGL framework across various models and datasets.
more » « less
Free, publicly-accessible full text available July 1, 2024
{MGG}: Accelerating Graph Neural Networks with {Fine-Grained} {Intra-Kernel} {Communication-Computation} Pipelining on {Multi-GPU} Platforms

Wang, Yuke ; Feng, Boyuan ; Wang, Zheng ; Barker, Kevin ; Li, Ang ; Ding, Yufei. ( July 2023 , USENIX Association)

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG , a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41×, 4.81×, and 10.83× faster than DGL, MGG-UVM, and ROC, respectively.
more » « less
Free, publicly-accessible full text available July 1, 2024

« Prev Next »