The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For
The shear viscosity
 Award ID(s):
 2012947
 NSFPAR ID:
 10375933
 Publisher / Repository:
 Springer Science + Business Media
 Date Published:
 Journal Name:
 The European Physical Journal C
 Volume:
 82
 Issue:
 10
 ISSN:
 14346052
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract discounted infinitehorizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a nearoptimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$$\gamma $$ $\gamma $ and the effective horizon$${\mathcal {S}}$$ $S$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\frac{1}{1\gamma }$$ $\frac{1}{1\gamma}$ can take$$\eta $$ $\eta $ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefullyconstructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.$$\begin{aligned} \frac{1}{\eta } {\mathcal {S}}^{2^{\Omega \big (\frac{1}{1\gamma }\big )}} ~\text {iterations} \end{aligned}$$ $\begin{array}{c}\frac{1}{\eta}{\leftS\right}^{{2}^{\Omega (\frac{1}{1\gamma})}}\phantom{\rule{0ex}{0ex}}\text{iterations}\end{array}$ 
Abstract Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shapesensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (
,$${D_{t(20,w)}^{0}}$$ ${D}_{t(20,w)}^{0}$ ) and the intrinsic viscosity ([$${s_{{\left( {{20},w} \right)}}^{{0}} }$$ ${s}_{\left(20,w\right)}^{0}$η ]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structurerelated pairwise distance distribution functionp (r ) vs.r . Using the extensively validated UltraScan SOlution MOdeler (USSOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding ,$${D_{t(20,w)}^{0}}$$ ${D}_{t(20,w)}^{0}$ , [$${s_{{\left( {{20},w} \right)}}^{{0}} }$$ ${s}_{\left(20,w\right)}^{0}$η ],p (r ) vs.r , and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to singlechain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the USSOMOAF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures. 
Abstract We introduce tools from discrete convexity theory and polyhedral geometry into the theory of West’s stacksorting map
s . Associated to each permutation is a particular set$$\pi $$ $\pi $ of integer compositions that appears in a formula for the fertility of$$\mathcal V(\pi )$$ $V\left(\pi \right)$ , which is defined to be$$\pi $$ $\pi $ . These compositions also feature prominently in more general formulas involving families of colored binary plane trees called$$s^{1}(\pi )$$ ${s}^{1}\left(\pi \right)$troupes and in a formula that converts from free to classical cumulants in noncommutative probability theory. We show that is a transversal discrete polymatroid when it is nonempty. We define the$$\mathcal V(\pi )$$ $V\left(\pi \right)$fertilitope of to be the convex hull of$$\pi $$ $\pi $ , and we prove a surprisingly simple characterization of fertilitopes as nestohedra arising from full binary plane trees. Using known facts about nestohedra, we provide a procedure for describing the structure of the fertilitope of$$\mathcal V(\pi )$$ $V\left(\pi \right)$ directly from$$\pi $$ $\pi $ using BousquetMélou’s notion of the canonical tree of$$\pi $$ $\pi $ . As a byproduct, we obtain a new combinatorial cumulant conversion formula in terms of generalizations of canonical trees that we call$$\pi $$ $\pi $quasicanonical trees . We also apply our results on fertilitopes to study combinatorial properties of the stacksorting map. In particular, we show that the set of fertility numbers has density 1, and we determine all infertility numbers of size at most 126. Finally, we reformulate the conjecture that is always realrooted in terms of nestohedra, and we propose natural ways in which this new version of the conjecture could be extended.$$\sum _{\sigma \in s^{1}(\pi )}x^{\textrm{des}(\sigma )+1}$$ ${\sum}_{\sigma \in {s}^{1}\left(\pi \right)}{x}^{\text{des}\left(\sigma \right)+1}$ 
Abstract The production of the
particle in heavyion collisions has been contemplated as an alternative probe of its internal structure. To investigate this conjecture, we perform transport calculations of the$$X(3872)$$ $X\left(3872\right)$ through the fireball formed in nuclear collisions at the LHC. Within a kineticrate equation approach as previously used for charmonia, the formation and dissociation of the$$X(3872)$$ $X\left(3872\right)$ is controlled by two transport parameters,$$X(3872)$$ $X\left(3872\right)$i.e. , its inelastic reaction rate and thermalequilibrium limit in the evolving hot QCD medium. While the equilibrium limit is controlled by the charm production cross section in primordial nucleonnucleon collisions (together with the spectra of charm states in the medium), the structure information is encoded in the reaction rate. We study how different scenarios for the rate affect the centrality dependence and transversemomentum ( ) spectra of the$$p_T$$ ${p}_{T}$ . Larger reaction rates associated with the loosely bound molecule structure imply that it is formed later in the fireball evolution than the tetraquark and thus its final yields are generally smaller by around a factor of two, which is qualitatively different from most coalescence model calculations to date. The$$X(3872)$$ $X\left(3872\right)$ spectra provide further information as the later decoupling time within the molecular scenario leads to harder spectra caused by the blueshift from the expanding fireball.$$p_T$$ ${p}_{T}$ 
Abstract Ultrapure NaI(Tl) crystals are the key element for a modelindependent verification of the long standing DAMA result and a powerful means to search for the annual modulation signature of dark matter interactions. The SABRE collaboration has been developing cuttingedge techniques for the reduction of intrinsic backgrounds over several years. In this paper we report the first characterization of a 3.4 kg crystal, named NaI33, performed in an underground passive shielding setup at LNGS. NaI33 has a record low
K contamination of 4.3 ± 0.2 ppb as determined by mass spectrometry. We measured a light yield of 11.1 ± 0.2 photoelectrons/keV and an energy resolution of 13.2% (FWHM/E) at 59.5 keV. We evaluated the activities of$$^{39}$$ ${}^{39}$ Ra and$$^{226}$$ ${}^{226}$ Th inside the crystal to be$$^{228}$$ ${}^{228}$ Bq/kg and$$5.9\pm 0.6~\upmu $$ $5.9\pm 0.6\phantom{\rule{0ex}{0ex}}\mu $ Bq/kg, respectively, which would indicate a contamination from$$1.6\pm 0.3~\upmu $$ $1.6\pm 0.3\phantom{\rule{0ex}{0ex}}\mu $ U and$$^{238}$$ ${}^{238}$ Th at partpertrillion level. We measured an activity of 0.51 ± 0.02 mBq/kg due to$$^{232}$$ ${}^{232}$ Pb out of equilibrium and a$$^{210}$$ ${}^{210}$ quenching factor of 0.63 ± 0.01 at 5304 keV. We illustrate the analyses techniques developed to reject electronic noise in the lower part of the energy spectrum. A cutbased strategy and a multivariate approach indicated a rate, attributed to the intrinsic radioactivity of the crystal, of$$\alpha $$ $\alpha $ 1 count/day/kg/keV in the [5–20] keV region.$$\sim $$ $\sim $