Solving computationally hard problems using conventional computing architectures is often slow and energetically inefficient. Quantum computing may help with these challenges, but it is still in the early stages of development. A quantum-inspired alternative is to build domain-specific architectures with classical hardware. Here we report a sparse Ising machine that achieves massive parallelism where the flips per second—the key figure of merit—scales linearly with the number of probabilistic bits. Our sparse Ising machine architecture, prototyped on a field-programmable gate array, is up to six orders of magnitude faster than standard Gibbs sampling on a central processing unit, and offers 5–18 times improvements in sampling speed compared with approaches based on tensor processing units and graphics processing units. Our sparse Ising machine can reliably factor semi-primes up to 32 bits and it outperforms competition-winning Boolean satisfiability solvers in approximate optimization. Moreover, our architecture can find the correct ground state, even when inexact sampling is made with faster clocks. Our problem encoding and sparsification techniques could be applied to other classical and quantum Ising machines, and our architecture could potentially be scaled to 1,000,000 or more p-bits using analogue silicon or nanodevice technologies.
more »
« less
Passel: Improved Scalability and Efficiency of Distributed SVM using a Cacheless PGAS Migrating Thread Architecture
Stochastic Gradient Descent (SGD) is a valuable algorithm for large-scale machine learning, but has proven difficult to parallelize on conventional architectures because of communication and memory access issues. The HogWild series of mixed logically distributed and physically multi-threaded algorithms overcomes these issues for problems with sparse characteristics by using multiple local model vectors with asynchronous atomic updates. While this approach has proven effective for several reported examples, there are others, especially very sparse cases, that do not scale as well. This paper discusses an SGD Support Vector Machine (SVM) on a cacheless migrating thread architecture using the Hogwild algorithms as a framework. Our implementations on this novel architecture achieved superior hardware efficiency and scalability over that of a conventional cluster using MPI. Furthermore these improvements were gained using naive data partitioning techniques and hardware with substantially less compute capability than that present in conventional systems.
more »
« less
- Award ID(s):
- 1822939
- PAR ID:
- 10402605
- Date Published:
- Journal Name:
- 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
- Page Range / eLocation ID:
- 27 to 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sparse matrix-matrix multiplication (SpMM) is a critical computational kernel in numerous scientific and machine learning applications. SpMM involves massive irregular memory accesses and poses great challenges to conventional cache-based computer architectures. Recently dedicated SpMM accelerators have been proposed to enhance SpMM performance. However, current SpMM accelerators still face challenges in adapting to varied sparse patterns, fully exploiting inherent parallelism, and optimizing cache performance. To address these issues, we introduce ACES, a novel SpMM accelerator in this study. First, ACES features an adaptive execution flow that dynamically adjusts to diverse sparse patterns. The adaptive execution flow balances parallel computing efficiency and data reuse. Second, ACES incorporates locality-concurrency co-optimizations within the global cache. ACES utilizes a concurrency-aware cache management policy, which considers data locality and concurrency for optimal replacement decisions. Additionally, the integration of a non-blocking buffer with the global cache enhances concurrency and reduces computational stalls. Third, the hardware architecture of ACES is designed to integrate all innovations. The architecture ensures efficient support across the adaptive execution flow, advanced cache optimizations, and fine-grained parallel processing. Our performance evaluation demonstrates that ACES significantly outperforms existing solutions, providing a 2.1× speedup and marking a substantial advancement in SpMM acceleration.more » « less
-
Abstract Gaussian process (GP) is a staple in the toolkit of a spatial statistician. Well‐documented computing roadblocks in the analysis of large geospatial datasets using GPs have now largely been mitigated via several recent statistical innovations. Nearest neighbor Gaussian process (NNGP) has emerged as one of the leading candidates for such massive‐scale geospatial analysis owing to their empirical success. This article reviews the connection of NNGP to sparse Cholesky factors of the spatial precision (inverse‐covariance) matrix. Focus of the review is on these sparse Cholesky matrices which are versatile and have recently found many diverse applications beyond the primary usage of NNGP for fast parameter estimation and prediction in the spatial (generalized) linear models. In particular, we discuss applications of sparse NNGP Cholesky matrices to address multifaceted computational issues in spatial bootstrapping, simulation of large‐scale realizations of Gaussian random fields, and extensions to nonparametric mean function estimation of a GP using random forests. We also review a sparse‐Cholesky‐based model for areal (geographically aggregated) data that addresses long‐established interpretability issues of existing areal models. Finally, we highlight some yet‐to‐be‐addressed issues of such sparse Cholesky approximations that warrant further research. This article is categorized under:Algorithms and Computational Methods > AlgorithmsAlgorithms and Computational Methods > Numerical Methodsmore » « less
-
null (Ed.)Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full loss function, and recovers model hyperparameters with rate O(1/K) up to a statistical error term depending on the minibatch size. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.more » « less
-
We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressive enough to incorporate arbitrary iteration orderings and many hardware-specific op timizations. We also present Custard, a compiler from a high-level language to SAM that demonstrates SAM's usefulness as an intermediate representation. We automatica lly bind from SAM to a streaming dataflow simulator. We evaluate the generality and extensibility of SAM, explore the performance space of sparse tensor algebra optim izations using SAM, and show SAM's ability to represent dataflow hardware.more » « less
An official website of the United States government

