Deep reinforcement learning (DRL) has gained immense success in many applications, including gaming AI, robotics, and system scheduling. Distributed algorithms and architectures have been vastly proposed (e.g., actor-learner architecture) to accelerate DRL training with large-scale server-based clusters. However, training on-policy algorithms with the actor-learner architecture unavoidably induces resource wasting due to synchronization between learners and actors, thus resulting in significantly extra billing. As a promising alternative, serverless computing naturally fits on-policy synchronization and alleviates resource wasting in distributed DRL training with pay-as-you-go pricing. Yet, none has leveraged serverless computing to facilitate DRL training. This paper proposes MinionsRL, the first serverless distributed DRL training framework that aims to accelerate DRL training- and cost-efficiency with dynamic actor scaling. We prototype MinionsRL on top of Microsoft Azure Container Instances and evaluate it with popular DRL tasks from OpenAI Gym. Extensive experiments show that MinionsRL reduces total training time by up to 52% and training cost by 86% compared to latest solutions. 
                        more » 
                        « less   
                    This content will become publicly available on September 1, 2026
                            
                            Nitro: Boosting Distributed Reinforcement Learning with Serverless Computing
                        
                    
    
            Deep reinforcement learning (DRL) has demonstrated significant potential in various applications, including gaming AI, robotics, and system scheduling. DRL algorithms produce, sample, and learn from training data online through a trial-and-error process, demanding considerable time and computational resources. To address this, distributed DRL algorithms and paradigms have been developed to expedite training using extensive resources. Through carefully designed experiments, we are the first to observe that strategically increasing the actor-environment interactions by spawning more concurrent actors at certain training rounds within ephemeral time frames can significantly enhance training efficiency. Yet, current distributed DRL solutions, which are predominantly server-based (or serverful), fail to capitalize on these opportunities due to their long startup times, limited adaptability, and cumbersome scalability. This paper proposesNitro, a generic training engine for distributed DRL algorithms that enforces timely and effective boosting with concurrent actors instantaneously spawned by serverless computing. With serverless functions,Nitroadjusts data sampling strategies dynamically according to the DRL training demands.Nitroseizes the opportunity of real-time boosting by accurately and swiftly detecting an empirical metric. To achieve cost efficiency, we design a heuristic actor scaling algorithm to guideNitrofor cost-aware boosting budget allocation. We integrateNitrowith state-of-the-art DRL algorithms and frameworks and evaluate them on AWS EC2 and Lambda. Experiments with Mujoco and Atari benchmarks show thatNitroimproves the final rewards (i.e., training quality) by up to 6× and reduces training costs by up to 42%. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10613616
- Publisher / Repository:
- Proceedings of the VLDB Endowment
- Date Published:
- Journal Name:
- Proceedings of the VLDB Endowment
- Volume:
- 18
- Issue:
- 1
- ISSN:
- 2150-8097
- Page Range / eLocation ID:
- 66 to 79
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Recent demand for distributed software had led to a surge in popularity in actor‐based frameworks. However, even with the stylized message passing model of actors, writing correct distributed software is still difficult. We present our work on linearizability checking in DS2, an integrated framework for specifying, synthesizing, and testing distributed actor systems. The key insight of our approach is that often subcomponents of distributed actor systems represent common algorithms or data structures (e.g., a distributed hash table or tree) that can be validated against a simple sequential model of the system. This makes it easy for developers to validate their concurrent actor systems without complex specifications. DS2 automatically explores the concurrent schedules that system could arrive at, and it compares observed output of the system to ensure it is equivalent to what the sequential implementation could have produced. We describe DS2's linearizability checking and test it on several concurrent replication algorithms from the literature. We explore in detail how different algorithms for enumerating the model schedule space fare in finding bugs in actor systems, and we present our own refinements on algorithms for exploring actor system schedules that we show are effective in finding bugs.more » « less
- 
            The actor model is a well-established way to approach to modularly designing and implementing concurrent and/or distributed systems, seeing increasing adoption in industry. But deductive verification tailored to actor programs remains underexplored; general concurrent logics could be used, but the logics are complex and full of features to reason about behaviors the actor model strives to avoid. We explore a relatively lightweight approach of extending a system for proving sequential program correctness with means to prove safety properties of actor programs (currently, assuming no faults). We borrow ideas from hybrid logic, a modal logic for stating assertions are true at a particular point in a model (in this case, a particular actor’s local state). To make such assertions useful, we stabilize them using rely-guarantee-style reasoning over local actor states, and only permit sending stable versions of these assertions to other actors. By carefully restricting the formation of assertions that a proposition is true at a certain actor, we avoid the need for actors to handle each others’ rely-guarantee relations explicitly. Finally, we argue that the approach requires only modest adjustments beyond applying traditional sequential techniques to actors with immutable messages, by implementing most of the logic as a Dafny library.more » « less
- 
            Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-“less” and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain function service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to rule-based solutions with heuristics, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments on widely used serverless benchmarks demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation (which we refer to as the baseline for assessing RL performance) with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases.more » « less
- 
            Not AvailableActor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction. However they must still deal with the challenge of messages arriving in unexpected orders. We describe an approach to restricting the orders in which actors send messages to each other, by equipping actor references—the handle used to address another actor—with a protocol restricting which message types can be sent to another actor and in which order using that particular actor reference. This endows the actor references with the properties of (flow-sensitive) static capabilities, which we call actor capabilities. By sending other actors only restricted actor references, actors may control which messages are sent in which orders by other actors. Rules for duplicating (splitting) actor references ensure that these restrictions apply even in the presence of delegation. The capabilities themselves restrict message send ordering, which may form the foundation for stronger forms of reasoning. We demonstrate this by layering an effect system over the base type system, where the relationships enforced between the actor capabilities and the effects of an actor’s behaviour ensure that an actor’s behaviour is always prepared to handle any message that may arrive.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
