skip to main content


Search for: All records

Award ID contains: 1943490

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. WebAssembly, an emerging bytecode format, which is initially developed for partially replacing JavaScript and speeding up browser applications, has been extended to the server-side due to its speed and security promise. It has been considered as a promising alternative to the widely deployed container technique for isolating lightweight applications. To run WebAssmebly from the server-side, aside from the NodeJS runtime, several WebAssembly native runtimes have been proposed. We characterize majorWebAssembly runtimes through extensive applications and metrics. Our results show that different runtimes fit different application scenarios. Based on that, a framework for reducing the startup latency of WebAssembly service while keeping maximum performance is provided. To identify the root causes of the performance gap, the analysis of emerging Cranelift compiler against LLVM in detail is reported. In addition, this paper gives revealing suggestions and architectural proposals for designing an efficient WebAssembly runtime. Our work provides insights on both WebAssembly runtime enhancement and WebAssemblybased cloud service exploitation. 
    more » « less
  2. Nowadays, the Radio Access Network (RAN) is resorting to Function Virtualization (NFV) paradigm to enhance its architectural viability. However, our characterization of virtual RAN (vRAN) on modern processors depicts a frustrating picture of Single-Instruction Multi-Data (SIMD) acceleration. The data arrangement processes in vRAN software pipeline do not align data for efficient SIMD processing across the pipeline. Specifically, existing data arrangement processes cannot fully utilize the ALU ports in modern processors, which leads to high backend bound and fails to saturate the memory bandwidth between registers and L1 cache. To overcome the overburden, we thoroughly examine the stateof- the-art CPU architecture and find there are idle ports which could be utilized by the process. Motivated by this observation, we propose "Arithmetic Ports Consciousness Mechanism" (APCM) utilizing these idle ports to eliminate the backend bound and saturate the memory bandwidth. The APCM decreases the data arrangement’s backend bound from 45% to 3% and promotes its memory bandwidth utilization by 4X-16X. The CPU time of the data arrangement process can be reduced by 67% - 92% and the overall latency of the vRAN packet transmission is decreased by 12% - 20%. 
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5. null (Ed.)