We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks. 
                        more » 
                        « less   
                    
                            
                            SPRC19: A Database of State Policy Responses to COVID-19 in the United States
                        
                    
    
            Abstract SPRC19 is a new database that seeks to capture a wide range of state policy actions in response to COVID-19 in the United States. Since March 2020 we have monitored state governments’ and multi-state associations’ websites for executive orders, agency rules, new legislation, and court decisions. We categorize each policy action into one of 206 distinct policies, then document the branch of government, source document, announcement date, implementation date, and expiration date (if applicable). We also record whether the action represents the introduction of a new policy or the expansion or contraction of an existing policy. The current release of SPRC19, v3.0, captures over 13,000 distinct policy actions through April 2020, which constitutes thousands more actions than similar resources over the same time period. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10439304
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Data
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2052-4463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract The challenge of adapting water resources systems to uncertain hydroclimatic and socioeconomic conditions warrants a dynamic planning approach. Recent studies have designed policies with structures linking infrastructure and management actions to threshold values of indicator variables observed over time. Typically, one or more of these components are held fixed while the others are optimized, constraining the flexibility of policy generation. Here we develop a framework to address this challenge by designing and testing dynamic adaptation policies that combine indicators, actions, and thresholds in a flexible structure. The approach is demonstrated for a case study of northern California, where a mix of infrastructure, management, and operational adaptations are considered over time in response to an ensemble of nonstationary hydrology and water demands. We first identify a subset of non‐dominated policies that are robust to held‐out scenarios, and then analyze their most common actions and indicators compared to non‐robust policies. Results show that the robust policies are not differentiated by the actions they select, but show substantial differences in their indicator variables, which can be interpreted in the context of physical hydrologic trends. In particular, the most frequent statistical transformations of indicator variables highlight the balance between adapting quickly versus correctly. Additionally, we determine the indicators most frequently associated with each action, as well as the distribution of action timing across scenarios. This study presents a new and transferable problem framing for adaptation under uncertainty in which indicator variables, actions, and policy structure are identified simultaneously during the optimization.more » « less
- 
            We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and trajectory diffusion generation for learning robot manipulation from demonstrations. Our main innovation is to use a global transformerbased action predictor to predict actions at keyframes, a task that requires multimodal semantic scene understanding, and to use a local trajectory diffuser to predict trajectory segments that connect predicted macro-actions. ChainedDiffuser sets a new record on established manipulation benchmarks, and outperforms both state-of-the-art keypose (macro-action) prediction models that use motion planners for trajectory prediction, and trajectory diffusion policies that do not predict keyframe macro-actions. We conduct experiments in both simulated and realworld environments and demonstrate ChainedDiffuser’s ability to solve a wide range of manipulation tasks involving interactions with diverse objects.more » « less
- 
            Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.more » « less
- 
            Abstract Managing social‐ecological systems (SES) requires balancing the need to tailor actions to local heterogeneity and the need to work over large areas to accommodate the extent of SES. This balance is particularly challenging for policy since the level of government where the policy is being developed determines the extent and resolution of action.We make the case for a new research agenda focused on ecological federalism that seeks to address this challenge by capitalizing on the flexibility afforded by a federalist system of governance. Ecological federalism synthesizes the environmental federalism literature from law and economics with relevant ecological and biological literature to address a fundamental question: What aspects of SES should be managed by federal governments and which should be allocated to decentralized state governments?This new research agenda considers the bio‐geo‐physical processes that characterize state‐federal management tradeoffs for biodiversity conservation, resource management, infectious disease prevention, and invasive species control. Read the freePlain Language Summaryfor this article on the Journal blog.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
