skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Shanmugam, Karthikeyan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a fixed conditional distribution. The agent seeks to remain competitive with the β€œbest” among the given set of experts. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts and provide horizon-independent constant regret bounds that only scale linearly in the number of experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions. Further, we investigate the episodic setting where the agent interacts with an environment that changes over episodes. Each episode can have different context and reward distributions resulting in the best expert changing across episodes. We show that by bootstrapping from\(\mathcal {O}(N\log (NT^2\sqrt {E}))\)samples, ED-UCB guarantees a regret that scales as\(\mathcal {O}(E(N+1) + \frac{N\sqrt {E}}{T^2})\)forNexperts overEepisodes, each of lengthT. We finally empirically validate our findings through simulations. 
    more » « less
    Free, publicly-accessible full text available September 30, 2025
  2. This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD(πœ†), off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD(πœ†), we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V-trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit. 
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5. Coded multicasting has been shown to be a promising approach to significantly improve the performance of content delivery networks with multiple caches downstream of a common multicast link. However, the schemes that have been shown to achieve order-optimal performance require content items to be partitioned into several packets that grows exponentially with the number of caches, leading to codes of exponential complexity that jeopardize their promising performance benefits. In this paper, we address this crucial performance-complexity tradeoff in a heterogeneous caching network setting, where edge caches with possibly different storage capacity collect multiple content requests that may follow distinct demand distributions. We extend the asymptotic (in the number of packets per file) analysis of shared link caching networks to heterogeneous network settings, and present novel coded multicast schemes, based on local graph coloring, that exhibit polynomial-time complexity in all the system parameters, while preserving the asymptotically proven multiplicative caching gain even for finite file packetization. We further demonstrate that the packetization order (the number of packets each file is split into) can be traded-off with the number of requests collected by each cache, while preserving the same multiplicative caching gain. Simulation results confirm the superiority of the proposed schemes and illustrate the interesting request aggregation vs. packetization order tradeoff within several practical settings. Our results provide a compelling step towards the practical achievability of the promising multiplicative caching gain in next generation access networks. 
    more » « less
  6. We consider the problem of learning causal structures with latent variables using interventions. Our objective is not only to learn the causal graph between the observed variables, but to locate unobserved variables that could confound the relationship between observables. Our approach is stage-wise: We first learn the observable graph, i.e., the induced graph between observable variables. Next, we learn the existence and location of the latent variables given the observable graph. We propose an efficient randomized algorithm that can learn the observable graph using O(d log2n) interventions where d is the degree of the graph. We further propose an efficient deterministic variant which uses O(log n + l) interventions, where l is the longest directed path in the graph. Next, we propose an algorithm that uses only O(d2 log n) interventions that can learn the latents between both nonadjacent and adjacent variables. While a naive baseline approach would require O(n2) interventions, our combined algorithm can learn the causal graph with latents using O(d log2n + d2 log (n)) interventions. 
    more » « less
  7. We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution f (x, y, z) of continuous random vectors X, Y and Z, we determine whether X is independent Y |Z. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution fCI(x,y,z) = f(x|z)f(y|z)f(z) – the joint distribution if and only if X is independent Y |Z. – when given access only to i.i.d. samples from the true joint distribution f (x, y, z). To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to f^{CI} in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d near- independent samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods. 
    more » « less