skip to main content

Search for: All records

Creators/Authors contains: "Wang, H"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We study a model-free federated linear quadratic regulator (LQR) problem where M agents with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to minimize an average quadratic cost while keeping their data private. To exploit the similarity of the agents' dynamics, we propose to use federated learning (FL) to allow the agents to periodically communicate with a central server to train policies by leveraging a larger dataset from all the agents. With this setup, we seek to understand the following questions: (i) Is the learned common policy stabilizing for all agents? (ii) How close is the learned common policy to each agent's own optimal policy? (iii) Can each agent learn its own optimal policy faster by leveraging data from all agents? To answer these questions, we propose a federated and model-free algorithm named FedLQR. Our analysis overcomes numerous technical challenges, such as heterogeneity in the agents' dynamics, multiple local updates, and stability concerns. We show that FedLQR produces a common policy that, at each iteration, is stabilizing for all agents. We provide bounds on the distance between the common policy and each agent's local optimal policy. Furthermore, we prove that when learning each agent's optimal policy, FedLQR achievesmore »a sample complexity reduction proportional to the number of agents M in a low-heterogeneity regime, compared to the single-agent setting.« less
    Free, publicly-accessible full text available August 1, 2024
  2. Free, publicly-accessible full text available June 1, 2024
  3. Matni, N. ; Morari, M ; Pappas, G. (Ed.)
    We study the problem of learning a linear system model from the observations of M clients. The catch: Each client is observing data from a different dynamical system. This work addresses the question of how multiple clients collaboratively learn dynamical models in the presence of heterogeneity. We pose this problem as a federated learning problem and characterize the tension between achievable performance and system heterogeneity. Furthermore, our federated sample complexity result provides a constant factor improvement over the single agent setting. Finally, we describe a meta federated learning algorithm, FedSysID, that leverages existing federated algorithms at the client level.
    Free, publicly-accessible full text available June 1, 2024
  4. We address the problem of learning linear system models from observing multiple trajectories from different system dynamics. This framework encompasses a collaborative scenario where several systems seeking to estimate their dynamics are partitioned into clusters according to their system similarity. Thus, the systems within the same cluster can benefit from the observations made by the others. Considering this framework, we present an algorithm where each system alternately estimates its cluster identity and performs an estimation of its dynamics. This is then aggregated to update the model of each cluster. We show that under mild assumptions, our algorithm correctly estimates the cluster identities and achieves an approximate sample complexity that scales inversely with the number of systems in the cluster, thus facilitating a more efficient and personalized system identification process.
    Free, publicly-accessible full text available April 1, 2024
  5. Free, publicly-accessible full text available May 1, 2024
  6. We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expedite the process of evaluating a common policy? To answer this question, we provide the first comprehensive finite-time analysis of a federated temporal difference (TD) learning algorithm with linear function approximation, while accounting for Markovian sampling, heterogeneity in the agents' environments, and multiple local updates to save communication. Our analysis crucially relies on several novel ingredients: (i) deriving perturbation bounds on TD fixed points as a function of the heterogeneity in the agents' underlying Markov decision processes (MDPs); (ii) introducing a virtual MDP to closely approximate the dynamics of the federated TD algorithm; and (iii) using the virtual MDP to make explicit connections to federated optimization. Putting these pieces together, we rigorously prove that in a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.
    Free, publicly-accessible full text available February 1, 2024
  7. Abstract

    Experiments have been conducted in the DIII-D tokamak to explore thein-situgrowth of silicon-rich layers as a potential technique for real-time replenishment of surface coatings on plasma-facing components (PFCs) during steady-state long-pulse reactor operation. Silicon (Si) pellets of 1 mm diameter were injected into low- and high-confinement (L-mode and H-mode) plasma discharges with densities ranging from 3.9–7.5×1019m−3and input powers ranging from 5.5 to 9 MW. The small Si pellets were delivered with the impurity granule injector at frequencies ranging from 4 to 16 Hz corresponding to mass flow rates of 5–19 mg s−1(1–4.2×1020Si s−1) at cumulative amounts of up to 34 mg of Si per five-second discharge. Graphite samples were exposed to the scrape-off layer and private flux region plasmas through the divertor material evaluation system to evaluate the Si deposition on the divertor targets. The Si II emission at the sample correlates with silicon injection and suggests net surface Si-deposition in measurable amounts. Post-mortem analysis showed Si-rich coatings containing silicon oxides, of which SiO2is the dominant component. No evidence of SiC was found, which is attributed to low divertor surface temperatures. Thein-situand ex-situ analysis found that Si-rich coatings of at least 0.4–1.2 nm thickness havemore »been deposited at 0.4–0.7 nm s−1. The technique is estimated to coat a surface area of at least 0.94 m2on the outer divertor. These results demonstrate the potential of using real-time material injection to form Si-enriched layers on divertor PFCs during reactor operation.

    « less
  8. Free, publicly-accessible full text available June 21, 2024
  9. Free, publicly-accessible full text available February 1, 2024
  10. Free, publicly-accessible full text available December 1, 2023