NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ursa: Lightweight Resource Management for Cloud-Native Microservices

https://doi.org/10.1109/HPCA57654.2024.00077

Zhang, Yanqi; Zhou, Zhuangzhuang; Elnikety, Sameh; Delimitrou, Christina (March 2024, International Symposium on High Performance Computer Architecture)

Full Text Available
Making Kernel Bypass Practical for the Cloud with Junction

Fried, Joshua; Chaudhry, Gohar Irfan; Saurez, Enrique; Choukse, Esha; Goiri, Íñigo; Elnikety, Sameh; Fonseca, Rodrigo; Belay, Adam (April 2024, 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI'24))

Kernel bypass systems have demonstrated order of magnitude improvements in throughput and tail latency for network-intensive applications relative to traditional operating systems (OSes). To achieve such excellent performance, however, they rely on dedicated resources (e.g., spinning cores, pinned memory) and require application rewriting. This is unattractive to cloud operators because they aim to densely pack applications, and rewriting cloud software requires a massive investment of valuable developer time. For both reasons, kernel bypass, as it exists, is impractical for the cloud. In this paper, we show these compromises are not necessary to unlock the full benefits of kernel bypass. We present Junction, the first kernel bypass system that can pack thousands of instances on a machine while providing compatibility with unmodified Linux applications. Junction achieves high density through several advanced NIC features that reduce pinned memory and the overhead of monitoring large numbers of queues. It maintains compatibility with minimal overhead through optimizations that exploit a shared address space with the application. Junction scales to 19–62× more instances than existing kernel bypass systems and can achieve similar or better performance without code changes. Furthermore, Junction delivers significant performance benefits to applications previously unsupported by kernel bypass, including those that depend on runtime systems like Go, Java, Node, and Python. In a comparison to native Linux, Junction increases throughput by 1.6–7.0× while using 1.2–3.8× less cores across seven applications.
more » « less
Full Text Available
ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs

Mahgoub, Ashraf; Yi, Edgardo Barsallo; Shankar, Karthick; Elnikety, Sameh; Chaterji, Somali; Bagchi, Saurabh (July 2022, 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22))

Serverless applications represented as DAGs have been growing in popularity. For many of these applications, it would be useful to estimate the end-to-end (E2E) latency and to allocate resources to individual functions so as to meet probabilistic guarantees for the E2E latency. This goal has not been met till now due to three fundamental challenges. The ﬁrst is the high variability and correlation in the execution time of individual functions, the second is the skew in execution times of the parallel invocations, and the third is the incidence of cold starts. In this paper, we introduce ORION to achieve these goals. We ﬁrst analyze traces from a production FaaS infrastructure to identify three characteristics of serverless DAGs. We use these to motivate and design three features. The ﬁrst is a performance model that accounts for runtime variabilities and dependencies among functions in a DAG. The second is a method for co-locating multiple parallel invocations within a single VM thus mitigating content-based skew among these invocations. The third is a method for pre-warming VMs for subsequent functions in a DAG with the right look-ahead time. We integrate these three innovations and evaluate ORION on AWS Lambda with three serverless DAG applications. Our evaluation shows that compared to three competing approaches, ORION achieves up to 90% lower P95 latency without increasing $ cost, or up to 53% lower $ cost without increasing tail latency.
more » « less
Full Text Available
ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs

Mahgoub, Ashraf; Barsallo, Edgardo; Shankar, Karthick; Minocha, Eshaan; Elnikety, Sameh; Bagchi, Saurabh; Chaterji, Somali. (July 2022, USENIX OSDI proceedings)

Serverless applications represented as DAGs have been growing in popularity. For many of these applications, it would be useful to estimate the end-to-end (E2E) latency and to allocate resources to individual functions so as to meet probabilistic guarantees for the E2E latency. This goal has not been met till now due to three fundamental challenges. The first is the high variability and correlation in the execution time of individual functions, the second is the skew in execution times of the parallel invocations, and the third is the incidence of cold starts. In this paper, we introduce ORION to achieve this goal. We first analyze traces from a production FaaS infrastructure to identify three characteristics of serverless DAGs. We use these to motivate and design three features. The first is a performance model that accounts for runtime variabilities and dependencies among functions in a DAG. The second is a method for co-locating multiple parallel invocations within a single VM thus mitigating content-based skew among these invocations. The third is a method for pre-warming VMs for subsequent functions in a DAG with the right look-ahead time. We integrate these three innovations and evaluate ORION on AWS Lambda with three serverless DAG applications. Our evaluation shows that compared to three competing approaches, \name achieves up to 90\% lower P95 latency without increasing \$$ cost, or up to 53\% lower \$$ cost without increasing P95 latency.
more » « less
Full Text Available
WISEFUSE: Workload Characterization and DAG Transformation for Serverless Workflows

https://doi.org/10.1145/3489048.3530959

Mahgoub, Ashraf; Yi, Edgardo Barsallo; Shankar, Karthick; Minocha, Eshaan; Elnikety, Sameh; Bagchi, Saurabh; Chaterji, Somali (June 2022, ACM SIGMETRICS)

Full Text Available
Faster and Cheaper Serverless Computing on Harvested Resources

https://doi.org/10.1145/3477132.3483580

Zhang, Yanqi; Goiri, Íñigo; Chaudhry, Gohar Irfan; Fonseca, Rodrigo; Elnikety, Sameh; Delimitrou, Christina; Bianchini, Ricardo (October 2021, SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles)

Full Text Available
WISEFUSE: Workload Characterization and DAG Transformation for Serverless Workflows

https://doi.org/10.1145/3530892

Mahgoub, Ashraf; Yi, Edgardo_Barsallo; Shankar, Karthick; Minocha, Eshaan; Elnikety, Sameh; Bagchi, Saurabh; Chaterji, Somali (June 2022, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, and (b) stragglers when a DAG stage invokes a set of parallel functions that must complete before starting the next DAG stage. To address these limitations, we propose WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or budget. We introduce three optimizations: (1) Fusion combines in-series functions together in a single VM to reduce the communication overhead between cascaded functions. (2) Bundling executes a group of parallel invocations of a function in one VM to improve resource sharing among the parallel workers to reduce skew. (3) Resource Allocation assigns the right VM size to each function or function bundle in the DAG to reduce the E2E latency and cost. We implement WISEFUSE to evaluate it experimentally using three popular serverless applications with different DAG structures, memory footprints, and intermediate data sizes. Compared to competing approaches and other alternatives, WISEFUSE shows significant improvements in E2E latency and cost. Specifically, for a machine learning pipeline, WISEFUSE achieves P95 latency that is 67% lower than Photons, 39% lower than Faastlane, and 90% lower than SONIC without increasing the cost.
more » « less
Local trend discovery on real-time microblogs with uncertain locations in tight memory environments

https://doi.org/10.1007/s10707-019-00380-z

Almaslukh, Abdulaziz; Magdy, Amr; Aly, Ahmed M.; Mokbel, Mohamed F.; Elnikety, Sameh; He, Yuxiong; Nath, Suman; Aref, Walid G. (April 2020, GeoInformatica)

Full Text Available
Obtaining and Managing Answer Quality for Online Data-Intensive Services

Kelley, Jaimie; Stewart, Christopher; Morris, Nathaniel; Tiwari, Devesh; He, Yuxiong; Elnikety, Sameh (May 2017, ACM Transactions on Modeling and Performance Evaluation of Computing Systems)

Full Text Available
Work stealing for interactive services to meet target latency

https://doi.org/10.1145/2851141.2851151

Li, Jing; Agrawal, Kunal; Elnikety, Sameh; He, Yuxiong; Lee, I-Ting Angelina; Lu, Chenyang; McKinley, Kathryn S. (March 2016, Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests. We design a new adaptive work stealing policy, called tail-control, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.
more » « less
Full Text Available

Search for: All records