Policy information in computer networking today, such as reachability objectives of a controller program running on a Software Defined Network (henceforth referred to as SDN) or Border Gateway Protocol (henceforth referred to as BGP) configurations independently set by autonomous networks, are hard to manage. This is in sharp contrast to the relational data structured in a database that allows easy access. This paper asks why cannot (or how can) we turn network policies into relational data. One difficulty to such an approach is that a policy does not always translate to a \textit{definite} network snapshot, but rather is fully described only when we include all the possible network states it admits. We propose relational policies that, while capable of representing and manipulating sets of network states in exactly the same way as a single one, form a strong representation system and accurately capture the information in a policy with the usual Structured Query Language (henceforth referred to as SQL) interface. We demonstrate how, like relational database improves application productivity and enables rapid innovation, relational policies allow us to extend the elegant solutions that the database community developed, to mediate multiple data sources in order to address long-standing challenges and new opportunities for autonomous policy making in the distributed networking environment. We also show the feasibility of relational policies by evaluation on synthetic policies and realistic network topologies. 
                        more » 
                        « less   
                    
                            
                            Data-Driven Structured Policy Iteration for Homogeneous Distributed Systems
                        
                    
    
            Control of networked systems, comprised of interacting agents, is often achieved through modeling the underlying interactions. Constructing accurate models of such interactions–in the meantime–can become prohibitive in applications. Data-driven control methods avoid such complications by directly synthesizing a controller from the observed data. In this paper, we propose an algorithm referred to as Data-driven Structured Policy Iteration (D2SPI), for synthesizing an efficient feedback mechanism that respects the sparsity pattern induced by the underlying interaction network. In particular, our algorithm uses temporary “auxiliary” communication links in order to enable the required information exchange on a (smaller) sub-network during the “learning phase”—links that will be removed subsequently for the final distributed feedback synthesis. We then proceed to show that the learned policy results in a stabilizing structured policy for the entire network. Our analysis is then followed by showing the stability and convergence of the proposed distributed policies throughout the learning phase, exploiting a construct referred to as the “Patterned monoid.” The performance of D2SPI is then demonstrated using representative simulation scenarios. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2149470
- PAR ID:
- 10516482
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Automatic Control
- ISSN:
- 0018-9286
- Page Range / eLocation ID:
- 1 to 15
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.more » « less
- 
            Deep reinforcement learning (RL) has led to encouraging successes in many challenging control tasks. However, a deep RL model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Programmatic policies structured in more interpretable representations emerge as a promising solution. Yet two shortcomings remain: First, synthesizing programmatic policies requires optimizing over the discrete and non-differentiable search space of program architectures. Previous works are suboptimal because they only enumerate program architectures greedily guided by a pretrained RL oracle. Second, these works do not exploit compositionality, an important programming concept, to reuse and compose primitive functions to form a complex function for new tasks. Our first contribution is a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules. Our algorithm allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle. Our second contribution is improving programmatic policies to support compositionality by integrating primitive functions learned to grasp task-agnostic skills as a composite program to solve novel RL problems. Experiment results demonstrate that our algorithm excels in discovering optimal programmatic policies that are highly interpretable.more » « less
- 
            null (Ed.)This paper presents a reinforcement learning approach to synthesizing task-driven control policies for robotic systems equipped with rich sensory modalities (e.g., vision or depth). Standard reinforcement learning algorithms typically produce policies that tightly couple control actions to the entirety of the system's state and rich sensor observations. As a consequence, the resulting policies can often be sensitive to changes in task-irrelevant portions of the state or observations (e.g., changing background colors). In contrast, the approach we present here learns to create a task-driven representation that is used to compute control actions. Formally, this is achieved by deriving a policy gradient-style algorithm that creates an information bottleneck between the states and the task-driven representation; this constrains actions to only depend on task-relevant information. We demonstrate our approach in a thorough set of simulation results on multiple examples including a grasping task that utilizes depth images and a ball-catching task that utilizes RGB images. Comparisons with a standard policy gradient approach demonstrate that the task-driven policies produced by our algorithm are often significantly more robust to sensor noise and task-irrelevant changes in the environment.more » « less
- 
            This paper presents a data-driven methodology for linear embedding of nonlinear systems. Utilizing structural knowledge of general nonlinear dynamics, the authors exploit the Koopman operator to develop a systematic, data-driven approach for constructing a linear representation in terms of higher order derivatives of the underlying nonlinear dynamics. With the linear representation, the nonlinear system is then controlled with an LQR feedback policy, the gains of which need to be calculated only once. As a result, the approach enables fast control synthesis. We demonstrate the efficacy of the approach with simulations and experimental results on the modeling and control of a tail-actuated robotic fish and show that the proposed policy is comparable to backstepping control. To the best of our knowledge, this is the first experimental validation of Koopman-based LQR control.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    