The explosive growth in computation and energy cost of artificial intelligence has spurred interest in alternative computing modalities to conventional electronic processors. Photonic processors, which use photons instead of electrons, promise optical neural networks with ultralow latency and power consumption. However, existing optical neural networks, limited by their designs, have not achieved the recognition accuracy of modern electronic neural networks. In this work, we bridge this gap by embedding parallelized optical computation into flat camera optics that perform neural network computations during capture, before recording on the sensor. We leverage large kernels and propose a spatially varying convolutional network learned through a low-dimensional reparameterization. We instantiate this network inside the camera lens with a nanophotonic array with angle-dependent responses. Combined with a lightweight electronic back-end of about 2K parameters, our reconfigurable nanophotonic neural network achieves 72.76% accuracy on CIFAR-10, surpassing AlexNet (72.64%), and advancing optical neural networks into the deep learning era.
more »
« less
An optical neural network using less than 1 photon per multiplication
Abstract Deep learning has become a widespread tool in both science and industry. However, continued progress is hampered by the rapid growth in energy costs of ever-larger deep neural networks. Optical neural networks provide a potential means to solve the energy-cost problem faced by deep learning. Here, we experimentally demonstrate an optical neural network based on optical dot products that achieves 99% accuracy on handwritten-digit classification using ~3.1 detected photons per weight multiplication and ~90% accuracy using ~0.66 photons (~2.5 × 10 −19 J of optical energy) per weight multiplication. The fundamental principle enabling our sub-photon-per-multiplication demonstration—noise reduction from the accumulation of scalar multiplications in dot-product sums—is applicable to many different optical-neural-network architectures. Our work shows that optical neural networks can achieve accurate results using extremely low optical energies.
more »
« less
- Award ID(s):
- 1918549
- PAR ID:
- 10320630
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 13
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Energy efficiency in computation is ultimately limited by noise, with quantum limits setting the fundamental noise floor. Analog physical neural networks hold promise for improved energy efficiency compared to digital electronic neural networks. However, they are typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large (>10), and the noise can be treated as a perturbation. We study optical neural networks where all layers except the last are operated in the limit that each neuron can be activated by just a single photon, and as a result the noise on neuron activations is no longer merely perturbative. We show that by using a physics-based probabilistic model of the neuron activations in training, it is possible to perform accurate machine-learning inference in spite of the extremely high shot noise (SNR ~ 1). We experimentally demonstrated MNIST handwritten-digit classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to just 0.038 photons per multiply-accumulate (MAC) operation. Our physics-aware stochastic training approach might also prove useful with non-optical ultra-low-power hardware.more » « less
-
The rapidly increasing size of deep-learning models has renewed interest in alternatives to digital-electronic computers as a means to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for them. In this paper, we investigate---through a combination of simulations and experiments on prototype optical hardware---the feasibility and potential energy benefits of running Transformer models on future optical accelerators that perform matrix-vector multiplication. We use simulations, with noise models validated by small-scale optical experiments, to show that optical accelerators for matrix-vector multiplication should be able to accurately run a typical Transformer architecture model for language processing. We demonstrate that optical accelerators can achieve the same (or better) perplexity as digital-electronic processors at 8-bit precision, provided that the optical hardware uses sufficiently many photons per inference, which translates directly to a requirement on optical energy per inference. We studied numerically how the requirement on optical energy per inference changes as a function of the Transformer width $$d$$ and found that the optical energy per multiply--accumulate (MAC) scales approximately as $$\frac{1}{d}$$, giving an asymptotic advantage over digital systems. We also analyze the total system energy costs for optical accelerators running Transformers, including both optical and electronic costs, as a function of model size. We predict that well-engineered, large-scale optical hardware should be able to achieve a $$100 \times$$ energy-efficiency advantage over current digital-electronic processors in running some of the largest current Transformer models, and if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical accelerators could have a $$>8,000\times$$ energy-efficiency advantage. Under plausible assumptions about future improvements to electronics and Transformer quantization techniques (5× cheaper memory access, double the digital--analog conversion efficiency, and 4-bit precision), we estimate that the energy advantage for optical processors versus electronic processors operating at 300~fJ/MAC could grow to $$>100,000\times$$.more » « less
-
Abstract In recent years, the computational demands of deep learning applications have necessitated the introduction of energy-efficient hardware accelerators. Optical neural networks are a promising option; however, thus far they have been largely limited by the lack of energy-efficient nonlinear optical functions. Here, we experimentally demonstrate an all-optical Rectified Linear Unit (ReLU), which is the most widely used nonlinear activation function for deep learning, using a periodically-poled thin-film lithium niobate nanophotonic waveguide and achieve ultra-low energies in the regime of femtojoules per activation with near-instantaneous operation. Our results provide a clear and practical path towards truly all-optical, energy-efficient nanophotonic deep learning.more » « less
-
Artificial Intelligence (AI) has permeated various domains but is limited by the bottlenecks imposed by data transfer latency inherent in contemporary memory technologies. Matrix multiplication, crucial for neural network training and inference, can be significantly expedited with a complexity of O(1) using Resistive RAM (RRAM) technology, instead of the conventional complexity of O(n2). This positions RRAM as a promising candidate for the efficient hardware implementation of machine learning and neural networks through in-memory computation. However, RRAM manufacturing technology remains in its infancy, rendering it susceptible to soft errors, potentially compromising neural network accuracy and reliability. In this paper, we propose a syndrome-based error correction scheme that employs selective weighted checksums to correct double adjacent column errors in RRAM. The error correction is done on the output of the matrix multiplication thus ensuring correct operation for any number of errors in two adjacent columns. The proposed codes have low redundancy and low decoding latency, making it suitable for high throughput applications. This schemeuses a repeating weight based structure that makes it scalable to large RRAM matrix sizes.more » « less
An official website of the United States government

