In many reinforcement learning (RL) applications, we want policies that reach desired states and then keep the controlled system within an acceptable region around the desired states over an indefinite period of time. This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states. For example, in stochastic queuing networks, where queues of waiting jobs can grow without bound, the desired state is all-zero queue lengths. Here, a stable policy ensures queue lengths are finite while an optimal policy minimizes queue lengths. Since an optimal policy is also stable, one would expect that RL algorithms would implicitly give us stable policies. However, in this work, we find that deep RL algorithms that directly minimize the distance to the desired state during online training often result in unstable policies, i.e., policies that drift far away from the desired state. We attribute this instability to poor credit-assignment for destabilizing actions. We then introduce an approach based on two ideas: 1) a Lyapunov-based cost-shaping technique and 2) state transformations to the unbounded state space. We conduct an empirical study on various queuing networks and traffic signal control problems and find that our approach performs competitively against strong baselines with knowledge of the transition dynamics. Our code is available here: https://github.com/Badger-RL/STOP.
more »
« less
Efficient Policy-Rich Rate Enforcement with Phantom Queues
Rate enforcement is routinely employed in modern networks (e.g. ISPs rate-limiting users traffic to the subscribed rates). In addition to correctly enforcing the desired rates, rate-limiting mechanisms must be able to support rich rate-sharing policies within each traffic aggregate (e.g. per-flow fairness, weighted fairness, and prioritization). And all of this must be done at scale to efficiently support the vast magnitude of users. There are two primary rate-limiting mechanisms -- traffic shaping (that buffers packets in queues to enforce the desired rates and policies) and traffic policing (that filters packets as per the desired rates without buffering them). Policers are light-weight and scalable, but do not support rich policy enforcement and often provide poor rate enforcement (being notoriously hard to configure). Shapers, on the other hand, achieve desired rates and policies, but at the cost of high system resource (memory and CPU) utilization which impacts scalability. In this paper, we explore whether we can get the best of both worlds -- the scalability of a policer with the rate and policy enforcement properties of a shaper. We answer this question in the affirmative with our system BC-PQP. BC-PQP augments a policer with (i) multiple phantom queues that simulate buffer occupancy using counters, and enable rich policy enforcement, and (ii) a novel burst control mechanism that enables auto-configuration of the queues for correct rate enforcement. We implement BC-PQP as a middlebox over DPDK. Our evaluation shows how it achieves the rate and policy enforcement properties close to that of a shaper with 7x higher efficiency.
more »
« less
- Award ID(s):
- 2217144
- PAR ID:
- 10532011
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400706141
- Page Range / eLocation ID:
- 1000 to 1013
- Format(s):
- Medium: X
- Location:
- Sydney NSW Australia
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Previous studies have observed that TCP pacing evenly spacing out packets-minimizes traffic burstiness, reduces packet losses, and increases throughput. However, the main drawback of pacing is that the number of flows and the bottleneck link capacity must be known in advance. With this information, pacing is achieved by manually tuning sender nodes to send at rates that aggregate to the bottleneck capacity. This paper proposes a scheme based on programmable switches by which rates are dynamically adjusted. These switches store the network's state in the data plane and notify sender nodes to update their pacing rates when the network's state changes, e.g., a new flow joins or leaves the network. The scheme uses a custom protocol that is encapsulated inside the IP Options header field and thus is compatible with legacy switches (i.e., the scheme does not require all switches to be programmable). Furthermore, the processing overhead at programmable switches is minimal, as custom packets are only generated when a flow joins or leaves the network. Simulation results conducted in Mininet demonstrate that the proposed scheme is capable of dynamically notifying hosts to adapt the pacing rate with a minimum delay, increasing throughput, mitigating the TCP sawtooth behavior, and achieving better fairness among concurrent flows. The proposed scheme and preliminary results are particularly attractive to applications such as Science DMZ, where typically a small number of large flows must share the bandwidth capacity.more » « less
-
Dispatching systems, where arriving jobs are immediately assigned to one of multiple queues, are ubiquitous in computer systems and service systems. A natural and practically relevant model is one in which each queue serves jobs in FCFS (First-Come First-Served) order. We consider the case where the dispatcher is size-aware, meaning it learns the size (i.e. service time) of each job as it arrives; and state-aware, meaning it always knows the amount of work (i.e. total remaining service time) at each queue. While size- and state-aware dispatching to FCFS queues has been extensively studied, little is known about optimal dispatching for the objective of minimizing mean delay. A major obstacle is that no nontrivial lower bound on mean delay is known, even in heavy traffic (i.e. the limit as load approaches capacity). This makes it difficult to prove that any given policy is optimal, or even heavy-traffic optimal. In this work, we propose the first size- and state-aware dispatching policy that provably minimizes mean delay in heavy traffic. Our policy, called CARD (Controlled Asymmetry Reduces Delay), keeps all but one of the queues short, then routes as few jobs as possible to the one long queue. We prove an upper bound on CARD's mean delay, and we prove the first nontrivial lower bound on the mean delay of any size- and state-aware dispatching policy. Both results apply to any number of servers. Our bounds match in heavy traffic, implying CARD's heavy-traffic optimality. In particular, CARD's heavy-traffic performance improves upon that of LWL (Least Work Left), SITA (Size Interval Task Assignment), and other policies from the literature whose heavy-traffic performance is known.more » « less
-
Enterprises, including military, law enforcement, medical, financial, and commercial organizations, must often share large quantities of data, some potentially sensitive, with many other enterprises. A key issue, the mechanics of data sharing, involves how to precisely and unambiguously specify which data to share with which partner or group of partners. This issue can be addressed through a system of formal data sharing policy definitions and automated enforcement. Several challenges arise when specifying enterprise-level data sharing policies. A first challenge involves the scale and complexity of data types to be shared. An easily understood method is required to represent and visualize an enterprise’s data types and their relationships so that users can quickly, easily, and precisely specify which data types and relationships to share. A second challenge involves the scale and complexity of data sharing partners. Enterprises typically have many partners involved in different projects, and there are often complex hierarchies among groups of partners that must be considered and navigated to specify which partners or groups of partners to include in a data sharing policy. A third challenge is that defining policies formally, given the first two challenges of scale and complexity, requires complex, precise language, but these languages are difficult to use by non-specialists. More useable methods of policy specification are needed. Our approach was to develop a software wizard that walks users through a series of steps for defining a data sharing policy. A combination of innovative and well known methods is used to address these challenges of scale, complexity, and usability.more » « less
-
null (Ed.)Abstract Unlimited access to a motorway network can, in overloaded conditions, cause a loss of throughput. Ramp metering, by controlling access to the motorway at onramps, can help avoid this loss of throughput. The queues that form at onramps are dependent on the metering rates chosen at the onramps, and these choices affect how the capacities of different motorway sections are shared amongst competing flows. In this paper we perform an analytical study of a fluid, or differential equation, model of a linear network topology with onramp queues. The model allows for adaptive arrivals, in the sense that the rate at which external traffic enters the queue at an onramp can depend on the current perceived delay in that queue. The model also includes a ramp metering policy which uses global onramp queue length information to determine the rate at which traffic enters the motorway from each onramp. This ramp metering policy minimizes the maximum delay over all onramps and produces equal delay times over many onramps. The paper characterizes both the dynamics and the equilibrium behavior of the system under this policy. While we consider an idealized model that leaves out many practical details, an aim of the paper is to develop analytical methods that yield interesting qualitative insights and might be adapted to more general contexts. The paper can be considered as a step in developing an analytical approach towards studying more complex network topologies and incorporating other model features.more » « less
An official website of the United States government

