skip to main content

Title: Robust Testing and Estimation under Manipulation Attacks
We study robust testing and estimation of discrete distributions in the strong contamination model. Our results cover both centralized setting and distributed setting with general local information constraints including communication and LDP constraints. Our technique relates the strength of manipulation attacks to the earth-mover distance using Hamming distance as the metric between messages (samples) from the users. In the centralized setting, we provide optimal error bounds for both learning and testing. Our lower bounds under local information constraints build on the recent lower bound methods in distributed inference. In the communication constrained setting, we develop novel algorithms based on random hashing and an L1-L1 isometry.  more » « less
Award ID(s):
1815893 1846300
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case. 
    more » « less
  2. We consider information-theoretic bounds on the expected generalization error for statistical learning problems in a network setting. In this setting, there are K nodes, each with its own independent dataset, and the models from the K nodes have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of 1/K on the number of nodes. These “per node” bounds are in terms of the mutual information between the training dataset and the trained weights at each node and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node. 
    more » « less
  3. Distributed learning platforms for processing large scale data-sets are becoming increasingly prevalent. In typical distributed implementations, a centralized master node breaks the data-set into smaller batches for parallel processing across distributed workers to achieve speed-up and efficiency. Several computational tasks are of sequential nature, and involve multiple passes over the data. At each iteration over the data, it is common practice to randomly re-shuffle the data at the master node, assigning different batches for each worker to process. This random re-shuffling operation comes at the cost of extra communication overhead, since at each shuffle, new data points need to be delivered to the distributed workers. In this paper, we focus on characterizing the information theoretically optimal communication overhead for the distributed data shuffling problem. We propose a novel coded data delivery scheme for the case of no excess storage, where every worker can only store the assigned data batches under processing. Our scheme exploits a new type of coding opportunity and is applicable to any arbitrary shuffle, and for any number of workers. We also present information theoretic lower bounds on the minimum communication overhead for data shuffling, and show that the proposed scheme matches this lower bound for the worst-case communication overhead. 
    more » « less
  4. null (Ed.)
    A quickest change detection problem is considered in a sensor network with observations whose statistical dependency structure across the sensors before and after the change is described by a decomposable graphical model (DGM). Distributed computation methods for this problem are proposed that are capable of producing the optimum centralized test statistic. The DGM leads to the proper way to collect nodes into local groups equivalent to cliques in the graph, such that a clique statistic which summarizes all the clique sensor data can be computed within each clique. The clique statistics are transmitted to a decision maker to produce the optimum centralized test statistic. In order to further improve communication efficiency, an ordered transmission approach is proposed where transmissions of the clique statistics to the fusion center are ordered and then adaptively halted when sufficient information is accumulated. This procedure is always guaranteed to provide the optimal change detection performance, despite not transmitting all the statistics from all the cliques. A lower bound on the average number of transmissions saved by ordered transmissions is provided and for the case where the change seldom occurs the lower bound approaches approximately half the number of cliques provided a well behaved distance measure between the distributions of the sensor observations before and after the change is sufficiently large. We also extend the approach to the case when the graph structure is different under each hypothesis. Numerical results show significant savings using the ordered transmission approach and validate the theoretical findings. 
    more » « less
  5. We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learn- ing (FL) framework. We propose a distributed communication-efficient and local differentially private stochastic gradient descent (CLDP-SGD) algorithm and analyze its communication, privacy, and convergence trade-offs. Since each iteration of the CLDP- SGD aggregates the client-side local gradients, we develop (optimal) communication-efficient schemes for mean estimation for several lp spaces under local differential privacy (LDP). To overcome performance limitation of LDP, CLDP-SGD takes advantage of the inherent privacy amplification provided by client sub- sampling and data subsampling at each se- lected client (through SGD) as well as the recently developed shuffled model of privacy. For convex loss functions, we prove that the proposed CLDP-SGD algorithm matches the known lower bounds on the centralized private ERM while using a finite number of bits per iteration for each client, i.e., effectively get- ting communication efficiency for “free”. We also provide preliminary experimental results supporting the theory. 
    more » « less