We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.
more »
« less
$q$ -Racah Ensemble and $q$-P$\left (E_7^{(1)}/A_{1}^{(1)}\right )$ Discrete Painlevé Equation
Abstract The goal of this paper is to investigate the missing part of the story about the relationship between the orthogonal polynomial ensembles and Painlevé equations. Namely, we consider the $$q$$-Racah polynomial ensemble and show that the one-interval gap probabilities in this case can be expressed through a solution of the discrete $$q$$-P$$\left (E_7^{(1)}/A_{1}^{(1)}\right )$$ equation. Our approach also gives a new Lax pair for this equation. This Lax pair has an interesting additional involutive symmetry structure.
more »
« less
- Award ID(s):
- 1704186
- PAR ID:
- 10123228
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- International Mathematics Research Notices
- ISSN:
- 1073-7928
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Many integrable stochastic particle systems in one space dimension (such as TASEP—Totally Asymmetric Simple Exclusion Process—and itsq-deformation, theq-TASEP) remain integrable if we equip each particle with its own speed parameter. In this work, we present intertwining relations between Markov transition operators of particle systems which differ by a permutation of the speed parameters. These relations generalize our previous works (Petrov and Saenz in Probab Theory Relat Fields 182:481–530, 2022), (Petrov in SIGMA 17(021):34, 2021), but here we employ a novel approach based on the Yang-Baxter equation for the higher spin stochastic six vertex model. Our intertwiners are Markov transition operators, which leads to interesting probabilistic consequences. First, we obtain a new Lax-type differential equation for the Markov transition semigroups of homogeneous, continuous-time versions of our particle systems. Our Lax equation encodes the time evolution of multipoint observables of theq-TASEP and TASEP in a unified way, which may be of interest for the asymptotic analysis of multipoint observables of these systems. Second, we show that our intertwining relations lead to couplings between probability measures on trajectories of particle systems which differ by a permutation of the speed parameters. The conditional distribution for such a coupling is realized as a “rewriting history” random walk which randomly resamples the trajectory of a particle in a chamber determined by the trajectories of the neighboring particles. As a byproduct, we construct a new coupling for standard Poisson processes on the positive real half-line with different rates.more » « less
-
In this work, we investigate the two-component modified Korteweg-de Vries (mKdV) equation, which is a complete integrable system, and accepts a generalization of 4 × 4 matrix Ablowitz–Kaup–Newell-Segur (AKNS)-type Lax pair. By using of the unified transform approach, the initial-boundary value (IBV) problem of the two-component mKdV equation associated with a 4 × 4 matrix Lax pair on the half-line will be analyzed. Supposing that the solution {u1(x, t), u2(x, t)} of the two-component mKdV equation exists, we will show that it can be expressed in terms of the unique solution of a 4 × 4 matrix Riemann–Hilbert problem formulated in the complex λ-plane. Moreover, we will prove that some spectral functions s(λ) and S(λ) are not independent of each other but meet the global relationship.more » « less
-
Abstract Long time dynamics of the smoothed step initial value problem or dispersive Riemann problem for the Benjamin‐Bona‐Mahony (BBM) equationare studied using asymptotic methods and numerical simulations. The catalog of solutions of the dispersive Riemann problem for the BBM equation is much richer than for the related, integrable, Korteweg‐de Vries equation. The transition width of the initial smoothed step is found to significantly impact the dynamics. Narrow width gives rise to rarefaction and dispersive shock wave (DSW) solutions that are accompanied by the generation of two‐phase linear wavetrains, solitary wave shedding, and expansion shocks. Both narrow and broad initial widths give rise to two‐phase nonlinear wavetrains or DSW implosion and a new kind of dispersive Lax shock for symmetric data. The dispersive Lax shock is described by an approximate self‐similar solution of the BBM equation whose limit asis a stationary, discontinuous weak solution. By introducing a slight asymmetry in the data for the dispersive Lax shock, the generation of an incoherent solitary wavetrain is observed. Further asymmetry leads to the DSW implosion regime that is effectively described by a pair of coupled nonlinear Schrödinger equations. The complex interplay between nonlocality, nonlinearity, and dispersion in the BBM equation underlies the rich variety of nonclassical dispersive hydrodynamic solutions to the dispersive Riemann problem.more » « less
-
Abstract Complex scalars inU(1)-symmetric potentials can form stable Q-balls, non-topological solitons that correspond to spherical bound-state solutions. If theU(1) charge of the Q-ball is large enough, it can support a tower of unstable radial excitations with increasing energy. Previous analyses of these radial excitations were confined to fixed parameters, leading to excited states with different chargesQ. In this work, we provide the first characterization of the radial excitations of solitons for fixed charge, providing the physical spectrum for such objects. We also show how to approximately describe these excited states analytically and predict their global properties such as radius, energy, and charge. This enables a complete characterization of the radial spectrum. We also comment on the decay channels of these excited states.more » « less
An official website of the United States government
