skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Mahmood, A"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach \citep{zhang2022feel,dann2021provably}, which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature. 
    more » « less
  2. We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $$\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$$, where $$d$$ is the dimension of the feature mapping, $$H$$ is the planning horizon, and $$T$$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.\footnote{Our code is available at \url{https://github.com/hmishfaq/LMC-LSVI}} 
    more » « less
  3. We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $$\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$$, where $$d$$ is the dimension of the feature mapping, $$H$$ is the planning horizon, and $$T$$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.\footnote{Our code is available at \url{https://github.com/hmishfaq/LMC-LSVI}} 
    more » « less
  4. Free, publicly-accessible full text available April 1, 2026
  5. We study temperature dependent (200 – 400 K) dielectric current leakage in high-quality, epitaxial chromia films, synthesized on various conductive substrates (Pd, Pt and V2O3). We find that trap-assisted space-charge limited conduction is the dominant source of electrical leakage in the films, and that the density and distribution of charge traps within them is strongly dependent upon the choice of the underlying substrate. Pd-based chromia is found to exhibit leakage consistent with the presence of deep, discrete traps, a characteristic that is related to the known properties of twinning defects in the material. The Pt- and V2O3-based films, in contrast, show behavior typical of insulators with shallow, exponentially-distributed traps. The highest resistivity is obtained for chromia fabricated on V2O3 substrates, consistent with a lower total trap density in these films. Our studies suggest that chromia thin films formed on V2O3 substrates are a promising candidate for next-generation spintronics. 
    more » « less