NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Zero Spawn Overhead: Work Stealing Without Deques

https://doi.org/10.1145/3694906.3743349

Handleman, Aaron; Singer, Kyle; Schardl, Tao B; Lee, I-Ting Angelina (July 2025, ACM)

Full Text Available
The Tale of Errors in Microservices: Extended Abstract

https://doi.org/10.1145/3744970.3727320

Lee, I-Ting Angelina; Zhang, Zhizhou; Parwal, Abhishek; Chabbi, Milind (June 2025, ACM SIGMETRICS Performance Evaluation Review)

Microservice architectures have become the de facto paradigm for building scalable, service-oriented systems. Although their decentralized design promotes resilience and rapid development, the inherent complexity leads to subtle performance challenges. In particular,non-fatalerrors - internal failures of remote procedure calls that do not cause top-level request failures - can accumulate along the critical path, inflating latency and wasting resources. In this work, we analyze over 11 billion RPCs across more than 6,000 microservices at Uber. Our study shows that nearly 29% of successful requests experience non-fatal errors that remain hidden in traditional monitoring. We propose a novellatency-reduction estimator(LR estimator) to quantify the potential benefit of eliminating these errors. Our contributions include a systematic study of RPC error patterns, a methodology to estimate latency reductions, and case studies demonstrating up to a 30% reduction in tail latency.
more » « less
Full Text Available
Dynamic Partial Deadlock Detection and Recovery via Garbage Collection

https://doi.org/10.1145/3676641.3715990

Saioc, Georgian-Vlad; Lee, I-Ting Angelina; Møller, Anders; Chabbi, Milind (March 2025, ACM)

Full Text Available
The Tale of Errors in Microservices

https://doi.org/10.1145/3700436

Lee, I-Ting Angelina; Zhang, Zhizhou; Parwal, Abhishek; Chabbi, Milind (December 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Microservice architecture is the computing paradigm of choice for large, service-oriented software catering to real-time requests. Individual programs in such a system perform Remote Procedure Calls (RPCs) to other microservices to accomplish sub-tasks. Microservices are designed to be robust; top-level requests can succeed despite errors returned from RPC sub-tasks, referred to asnon-fatal errors.Because of this design, the top-level microservices tend to ''live with'' non-fatal errors. Hence, a natural question to ask is ''how prevalent are non-fatal errors and what impact do they have on the exposed latency of top-level requests?'' In this paper, we present a large-scale study of errors in microservices. We answer the aforementioned question by analyzing 11 Billion RPCs covering 1,900 user-facing endpoints at the Uber serving requests of hundreds of millions of active users. To assess the latency impact of non-fatal errors, we develop a methodology that projects potential latency savings for a given request as if the time spent on failing APIs were eliminated. This estimator allows ranking and bubbling up those APIs that are worthy of further investigations, where the non-fatal errors likely resulted in operational inefficiencies. Finally, we employ our error detection and impact estimation techniques to pinpoint operational inefficiencies, which a) result in a tail latency reduction of a critical endpoint by 30% and b) offer insights into common inefficiency-introducing patterns.
more » « less
Full Text Available
An Efficient Scheduler for Task-Parallel Interactive Applications

https://doi.org/10.1145/3558481.3591092

Singer, Kyle; Agrawal, Kunal; Lee, I-Ting Angelina (June 2023, Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures)
Responsive Parallelism with Synchronization

https://doi.org/10.1145/3591249

Muller, Stefan K.; Singer, Kyle; Keeney, Devyn Terra; Neth, Andrew; Agrawal, Kunal; Lee, I-Ting Angelina; Acar, Umut A. (June 2023, Proceedings of the ACM on Programming Languages)

Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority inheritance, but priority inversions in the use of condition variables are not well-studied and dynamic techniques are not suitable. In this work, we use a combination of static and dynamic techniques to prevent priority inversion in code that uses mutexes and condition variables. A type system ensures that condition variables are used safely, even while dynamic techniques change thread priorities at runtime to eliminate priority inversions in the use of mutexes. We prove the soundness of our system, using a model of priority inversions based on cost models for parallel programs. To show that the type system is practical to implement, we encode it within the type systems of Rust and C++, and show that the restrictions are not overly burdensome by writing sizeable case studies using these encodings, including porting the Memcached object server to use our C++ implementation.
more » « less
Full Text Available
An Efficient Task-Parallel Platform for Interactive Applications

https://doi.org/10.7936/tp06-8p89

Singer, Kyle (May 2023, Washington University in St. Louis)

Full Text Available
OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code

https://doi.org/10.1145/3572848.3577509

Schardl, Tao B.; Lee, I-Ting Angelina (February 2023, Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming)
PINT: Parallel INTerval-Based Race Detector

https://doi.org/10.1109/IPDPS53621.2022.00087

Xu, Yifan; Zhou, Anchengcheng; Agrawal, Kunal; Lee, I-Ting Angelina (May 2022, 2022 IEEE International Parallel and Distributed Processing Symposium)

Full Text Available
Efficient Access History for Race Detection

https://doi.org/10.1145/3409964.3461825

Xu, Yifan; Zhou, Anchengcheng; Yin, Grace Q.; Agrawal, Kunal; Lee, I-Ting Angelina; Schardl, Tao B. (January 2022, 022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX))

Full Text Available

« Prev Next »

Search for: All records