NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System

Zhendong Wang, Xiaoming Zeng (August 2022, arXivorg)

Full Text Available
Enabling efficient deep convolutional neural network-based sensor fusion for autonomous driving

https://doi.org/10.1145/3489517.3530444

Zeng, Xiaoming; Wang, Zhendong; Hu, Yang (July 2022, Proceedings of the 59th ACM/IEEE Design Automation Conference)

Full Text Available
A synergistic reinforcement learning-based framework design in driving automation

https://doi.org/10.1016/j.compeleceng.2022.107989

Qi, Yuqiong; Hu, Yang; Wu, Haibin; Li, Shen; Ye, Xiaochun; Fan, Dongrui (July 2022, Computers and Electrical Engineering)

Full Text Available
Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System

https://doi.org/10.1109/RTAS54340.2022.00031

Liu, Shaoshan; Wang, Jianda; Wang, Zhendong; Yu, Bo; Hu, Wei; Liu, Yahui; Tang, Jie; Song, Shuaiwen Leon; Liu, Cong; Hu, Yang (May 2022, RTAS 2022)

Full Text Available
Characterization and Implication of Edge WebAssembly Runtimes

https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00037

Wang, Zhen; Wang, Jianda; Wang, Zhendong; Hu, Yang (December 2021, 2021 IEEE 23rd Int Conf on High Performance Computing & Communications)

WebAssembly, an emerging bytecode format, which is initially developed for partially replacing JavaScript and speeding up browser applications, has been extended to the server-side due to its speed and security promise. It has been considered as a promising alternative to the widely deployed container technique for isolating lightweight applications. To run WebAssmebly from the server-side, aside from the NodeJS runtime, several WebAssembly native runtimes have been proposed. We characterize majorWebAssembly runtimes through extensive applications and metrics. Our results show that different runtimes fit different application scenarios. Based on that, a framework for reducing the startup latency of WebAssembly service while keeping maximum performance is provided. To identify the root causes of the performance gap, the analysis of emerging Cranelift compiler against LLVM in detail is reported. In addition, this paper gives revealing suggestions and architectural proposals for designing an efficient WebAssembly runtime. Our work provides insights on both WebAssembly runtime enhancement and WebAssemblybased cloud service exploitation.
more » « less
Full Text Available
Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption

https://doi.org/10.1109/ATS52891.2021.00032

Wang, Zhendong; Wang, Rujia; Jiang, Zihang; Tang, Xulong; Yin, Shouyi; Hu, Yang (November 2021, ATS 2022)

Full Text Available
Enabling Efficient SIMD Acceleration for Virtual Radio Access Network

https://doi.org/10.1145/3472456.3472477

Wang, Jianda; Hu, Yang (August 2021, ICPP ’21,)

Nowadays, the Radio Access Network (RAN) is resorting to Function Virtualization (NFV) paradigm to enhance its architectural viability. However, our characterization of virtual RAN (vRAN) on modern processors depicts a frustrating picture of Single-Instruction Multi-Data (SIMD) acceleration. The data arrangement processes in vRAN software pipeline do not align data for efficient SIMD processing across the pipeline. Specifically, existing data arrangement processes cannot fully utilize the ALU ports in modern processors, which leads to high backend bound and fails to saturate the memory bandwidth between registers and L1 cache. To overcome the overburden, we thoroughly examine the stateof- the-art CPU architecture and find there are idle ports which could be utilized by the process. Motivated by this observation, we propose "Arithmetic Ports Consciousness Mechanism" (APCM) utilizing these idle ports to eliminate the backend bound and saturate the memory bandwidth. The APCM decreases the data arrangement’s backend bound from 45% to 3% and promotes its memory bandwidth utilization by 4X-16X. The CPU time of the data arrangement process can be reduced by 67% - 92% and the overall latency of the vRAN packet transmission is decreased by 12% - 20%.
more » « less
Full Text Available
Q-VR: system-level design for future mobile collaborative virtual reality

https://doi.org/10.1145/3445814.3446715

Xie, Chenhao; Li, Xie; Hu, Yang; Peng, Huwan; Taylor, Michael; Song, Shuaiwen Leon (January 2021, Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)
null (Ed.)
Full Text Available
ANT-Man: Towards Agile Power Management in the Microservice Era

https://doi.org/10.1109/SC41405.2020.00082

Hou, Xiaofeng; Li, Chao; Liu, Jiacheng; Zhang, Lu; Hu, Yang; Guo, Minyi (November 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC))
null (Ed.)
Full Text Available
A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics

https://doi.org/10.1109/TC.2020.3000237

Zhou, Liwei; Hu, Yang; Makris, Yiorgos (November 2020, IEEE Transactions on Computers)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records