Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from quadratic to linear given the sufficiently many chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain a nearly linear communication cost based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.
more » « less- NSF-PAR ID:
- 10540504
- Publisher / Repository:
- Association for the Advancement of Artificial Intelligence - MIT Press
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 37
- Issue:
- 6
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 7332 to 7339
- Subject(s) / Keyword(s):
- ML: Bayesian Learning, ML: Probabilistic Methods, RU: Stochastic Models & Probabilistic Inference, RU: Stochastic Optimization, RU: Uncertainty Representations
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from O(P 2) to O(P) given sufficient many P chains. However, such an innovation largely disappears in big data problems due to the limited chains and extremely few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote the non-reversibility and obtain an appealing communication cost O(P log P) based on the optimal window size. In addition, we also analyze the bias when we adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct large-scale uncertainty approximation tasks without much tuning costs.more » « less
-
Federated bilevel learning has received increasing attention in various emerging machine learning and communication applications. Recently, several Hessian-vector-based algorithms have been proposed to solve the federated bilevel optimization problem. However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d.~datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. We show that FedMBO achieves a convergence rate of $\mathcal{O}\big(\frac{1}{\sqrt{nK}}+\frac{1}{K}+\frac{\sqrt{n}}{K^{3/2}}\big)$ on non-i.i.d.~datasets, where $n$ is the number of participating clients in each round, and $K$ is the total number of iteration. This is the first theoretical linear speedup result for non-i.i.d.~federated bilevel optimization. Extensive experiments validate our theoretical results and demonstrate the effectiveness of our proposed method.more » « less
-
Abstract Smart grid is an intelligent cyber physical system (CPS). The CPS generates a massive amount of data for efficient grid operation. In this paper, a big data‐driven, cloud‐based information and communication technology (ICT) framework for smart grid CPS is proposed. The proposed ICT framework deploys hybrid cloud servers to enhance scalability and reliability of smart grid communication infrastructure. Because the data in the ICT framework contains much privacy of customers and important data for automated controlling, the security of data transmission must be ensured. In order to secure the communications over the Internet in the system, identity‐based schemes are proposed especially because of their advantage in key management. Specifically, an identity‐based signcryption (IBSC) scheme is proposed to provide confidentiality, non‐repudiation, and data integrity. For practical purposes, an identity‐based signature scheme is relaxed from the proposed IBSC to provide non‐repudiation only. Moreover, identity‐based schemes are also proposed to achieve signature delegation within the ICT framework. Security of the proposed IBSC scheme is rigorously analyzed in this work. Efficiency of the proposed IBSC scheme is demonstrated with an implementation using modified Weil pairing over an elliptic curve. Copyright © 2016 John Wiley & Sons, Ltd.
-
This paper introduces arithmetic sketching, an abstraction of a primitive that several previous works use to achieve lightweight, low-communication zero-knowledge verification of secret-shared vectors. An arithmetic sketching scheme for a language L ⊆ F^n consists of (1) a randomized linear function compressing a long input x to a short “sketch,” and (2) a small arithmetic circuit that accepts the sketch if and only if x ∈ L, up to some small error. If the language L has an arithmetic sketching scheme with short sketches, then it is possible to test membership in L using an arithmetic circuit with few multiplication gates. Since multiplications are the dominant cost in protocols for computation on secret-shared, encrypted, and committed data, arithmetic sketching schemes give rise to lightweight protocols in each of these settings. Beyond the formalization of arithmetic sketching, our contributions are: – A general framework for constructing arithmetic sketching schemes from algebraic varieties. This framework unifies schemes from prior work and gives rise to schemes for useful new languages and with improved soundness error. – The first arithmetic sketching schemes for languages of sparse vectors: vectors with bounded Hamming weight, bounded L1 norm, and vectors whose few non-zero values satisfy a given predicate. – A method for “compiling” any arithmetic sketching scheme for a language L into a low-communication malicious-secure multi-server protocol for securely testing that a client-provided secret-shared vector is in L. We also prove the first nontrivial lower bounds showing limits on the sketch size for certain languages (e.g., vectors of Hamming-weight one) and proving the non-existence of arithmetic sketching schemes for others (e.g., the language of all vectors that contain a specific value).more » « less
-
We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize the communication cost. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe that there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.more » « less