NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SFS: smart OS Scheduling for Serverless Functions

Fu, Yuqi; Liu, Li; Wang, Haoliang; Cheng, Yue; Chen, Songqing (November 2022, International Conference for High Performance Computing Networking Storage and Analysis)

Serverless computing enables a new way of building and scaling cloud applications by allowing developers to write fine-grained serverless or cloud functions. The execution duration of a cloud function is typically short---ranging from a few milliseconds to hundreds of seconds. However, due to resource contentions caused by public clouds' deep consolidation, the function execution duration may get significantly prolonged and fail to accurately account for the function's true resource usage. We observe that the function duration can be highly unpredictable with huge amplification of more than 50× for an open-source FaaS platform (OpenLambda). Our experiments show that the OS scheduling policy of cloud functions' host server can have a crucial impact on performance. The default Linux scheduler, CFS (Completely Fair Scheduler), being oblivious to workloads, frequently context-switches short functions, causing a turnaround time that is much longer than their service time. We propose SFS (Smart Function Scheduler), which works entirely in the user space and carefully orchestrates existing Linux FIFO and CFS schedulers to approximate Shortest Remaining Time First (SRTF). SFS uses two-level scheduling that seamlessly combines a new FILTER policy with Linux CFS, to trade off increased duration of long functions for significant performance improvement for short functions. We implement SFS in the Linux user space and port it to OpenLambda. Evaluation results show that SFS significantly improves short functions' duration with a small impact on relatively longer functions, compared to CFS.
more » « less
Full Text Available
FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

https://doi.org/10.1145/3458817.3476211

Chai, Zheng; Chen, Yujing; Anwar, Ali; Zhao, Liang; Cheng, Yue; Rangwala, Huzefa (November 2021, The International Conference for High Performance Computing, Networking, Storage, and Analysis)

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem—where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck—where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.
more » « less
Full Text Available
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute

Wang, Ao; Chang, Shuai; Tian, Huangshi; Wang, Hongqi; Yang, Haoran; Li, Huiba; Du, Rui; Cheng, Yue. (July 2021, 2021 USENIX Annual Technical Conference (USENIX ATC 21))

Serverless computing, or Function-as-a-Service (FaaS), enables a new way of building and scaling applications by allowing users to deploy fine-grained functions while providing fully-managed resource provisioning and auto-scaling. Custom FaaS container support is gaining traction as it enables better control over OSes, versioning, and tooling for modernizing FaaS applications. However, providing rapid container provisioning introduces non-trivial challenges for FaaS providers, since container provisioning is costly, and real-world FaaS workloads exhibit highly dynamic patterns. In this paper, we design FaaSNet, a highly-scalable middleware system for accelerating FaaS container provisioning. FaaSNet is driven by the workload and infrastructure requirements of the FaaS platform at one of the world's largest cloud providers, Alibaba Cloud Function Compute. FaaSNet enables scalable container provisioning via a lightweight, adaptive function tree (FT) structure. FaaSNet uses an I/O efficient, on-demand fetching mechanism to further reduce provisioning costs at scale. We implement and integrate FaaSNet in Alibaba Cloud Function Compute. Evaluation results show that FaaSNet: (1) finishes provisioning 2,500 function containers on 1,000 virtual machines in 8.3 seconds, (2) scales 13.4× and 16.3× faster than Alibaba Cloud's current FaaS platform and a state-of-the-art P2P container registry (Kraken), respectively, and (3) sustains a bursty workload using 75.2% less time than an optimized baseline.
more » « less
Full Text Available
Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing

https://doi.org/10.1145/3419111.3421286

Carver, Benjamin; Zhang, Jingyuan; Wang, Ao; Anwar, Ali; Wu, Panruo; Cheng, Yue (October 2020, ACM Symposium on Cloud Computing 2020 (SoCC '20))
null (Ed.)
Executing complex, burst-parallel, directed acyclic graph (DAG) jobs poses a major challenge for serverless execution frameworks, which will need to rapidly scale and schedule tasks at high throughput, while minimizing data movement across tasks. We demonstrate that, for serverless parallel computations, decentralized scheduling enables scheduling to be distributed across Lambda executors that can schedule tasks in parallel, and brings multiple benefits, including enhanced data locality, reduced network I/Os, automatic resource elasticity, and improved cost effectiveness. We describe the implementation and deployment of our new serverless parallel framework, called Wukong, on AWS Lambda. We show that Wukong achieves near-ideal scalability, executes parallel computation jobs up to 68.17X faster, reduces network I/O by multiple orders of magnitude, and achieves 92.96% tenant-side cost savings compared to numpywren.
more » « less
Full Text Available
Customizable Scale-Out Key-Value Stores

https://doi.org/10.1109/TPDS.2020.2982640

Anwar, Ali; Cheng, Yue; Huang, Hai; Han, Jingoo; Sim, Hyogi; Lee, Dongyoon; Douglis, Fred; Butt, Ali R. (September 2020, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
TiFL: A Tier-based Federated Learning System

https://doi.org/10.1145/3369583.3392686

Chai, Zheng; Ali, Ahsan; Zawad, Syed; Truex, Stacey; Anwar, Ali; Baracaldo, Nathalie; Zhou, Yi; Ludwig, Heiko; Yan, Feng; Cheng, Yue (June 2020, Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20))

Full Text Available
InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache

Wang, Ao; Zhang, Jingyuan; Ma, Xiaolong; Anwar, Ali; Rupprecht, Lukas; Skourtis, Dimitrios; Tarasov, Vasily; Yan, Feng; Cheng, Yue (February 2020, 18th USENIX Conference on File and Storage Technologies)

Internet-scale web applications are becoming increasingly storage-intensive and rely heavily on in-memory object caching to attain required I/O performance. We argue that the emerging serverless computing paradigm provides a well-suited, cost-effective platform for object caching. We present InfiniCache, a first-of-its-kind in-memory object caching system that is completely built and deployed atop ephemeral serverless functions. InfiniCache exploits and orchestrates serverless functions' memory resources to enable elastic pay-per-use caching. InfiniCache's design combines erasure coding, intelligent billed duration control, and an efficient data backup mechanism to maximize data availability and cost-effectiveness while balancing the risk of losing cached state and performance. We implement InfiniCache on AWS Lambda and show that it: (1) achieves 31 – 96× tenant-side cost savings compared to AWS ElastiCache for a large-object-only production workload, (2) can effectively provide 95.4% data availability for each one hour window, and (3) enables comparative performance seen in a typical in-memory cache.
more » « less
Full Text Available
In Search of a Fast and Efficient Serverless DAG Engine

https://doi.org/10.1109/PDSW49588.2019.00005

Carver, Benjamin; Zhang, Jingyuan; Wang, Ao; Cheng, Yue (November 2019, 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW))

Full Text Available

Search for: All records