skip to main content


Title: An optical neural network using less than 1 photon per multiplication
Abstract Deep learning has become a widespread tool in both science and industry. However, continued progress is hampered by the rapid growth in energy costs of ever-larger deep neural networks. Optical neural networks provide a potential means to solve the energy-cost problem faced by deep learning. Here, we experimentally demonstrate an optical neural network based on optical dot products that achieves 99% accuracy on handwritten-digit classification using ~3.1 detected photons per weight multiplication and ~90% accuracy using ~0.66 photons (~2.5 × 10 −19  J of optical energy) per weight multiplication. The fundamental principle enabling our sub-photon-per-multiplication demonstration—noise reduction from the accumulation of scalar multiplications in dot-product sums—is applicable to many different optical-neural-network architectures. Our work shows that optical neural networks can achieve accurate results using extremely low optical energies.  more » « less
Award ID(s):
1918549
NSF-PAR ID:
10320630
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n log n) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results. 
    more » « less
  2. Abstract Photonic neural networks (PNN) are a promising alternative to electronic GPUs to perform machine-learning tasks. The PNNs value proposition originates from i) near-zero energy consumption for vector matrix multiplication once trained, ii) 10-100 ps short interconnect delays, iii) weak required optical nonlinearity to be provided via fJ/bit efficient emerging electrooptic devices. Furthermore, photonic integrated circuits (PIC) offer high data bandwidth at low latency, with competitive footprints and synergies to microelectronics architectures such as foundry access. This talk discusses recent advances in photonic neuromorphic networks and provides a vision for photonic information processors. Details include, 1) a comparison of compute performance technologies with respect to compute efficiency (i.e. MAC/J) and compute speed (i.e. MAC/s), 2) a discussion of photonic neurons, i.e. perceptrons, 3) architectural network implementations, 4) a broadcast-and-weight protocol, 5) nonlinear activation functions provided via electro-optic modulation, and 6) experimental demonstrations of early-stage prototypes. The talk will open up answering why neural networks are of interest, and concludes with an application regime of PNN processors which reside in deep-learning, nonlinear optimization, and real-time processing. 
    more » « less
  3. Optical neural networks (ONNs), implemented on an array of cascaded Mach–Zehnder interferometers (MZIs), have recently been proposed as a possible replacement for conventional deep learning hardware. They potentially offer higher energy efficiency and computational speed when compared to their electronic counterparts. By utilizing tunable phase shifters, one can adjust the output of each of MZI to enable emulation of arbitrary matrix–vector multiplication. These phase shifters are central to the programmability of ONNs, but they require a large footprint and are relatively slow. Here we propose an ONN architecture that utilizes parity–time (PT) symmetric couplers as its building blocks. Instead of modulating phase, gain–loss contrasts across the array are adjusted as a means to train the network. We demonstrate that PT symmetric ONNs (PT-ONNs) are adequately expressive by performing the digit-recognition task on the Modified National Institute of Standards and Technology dataset. Compared to conventional ONNs, the PT-ONN achieves a comparable accuracy (67% versus 71%) while circumventing the problems associated with changing phase. Our approach may lead to new and alternative avenues for fast training in chip-scale ONNs.

     
    more » « less
  4. null (Ed.)
    Nowadays most research in visual recognition using Convolutional Neural Networks (CNNs) follows the “deeper model with deeper confidence” belief to gain a higher recognition accuracy. At the same time, deeper model brings heavier computation. On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks. Moreover, the implementation of CNNs faces with the size, weight, and energy constraints on the embedded devices. In this paper, we implement the adaptive switching between shallow and deep networks to reach the highest throughput on a resource-constrained MPSoC with CPU and FPGA. To this end, we develop and present a novel architecture for the CNNs where a gate makes the decision whether using the deeper model is beneficial or not. Due to resource limitation on FPGA, the idea of partial reconfiguration has been used to accommodate deep CNNs on the FPGA resources. We report experimental results on CIFAR-10, CIFAR-100, and SVHN datasets to validate our approach. Using confidence metric as the decision making factor, only 69.8%, 71.8%, and 43.8% of the computation in the deepest network is done for CIFAR10, CIFAR-100, and SVHN while it can maintain the desired accuracy with the throughput of around 400 images per second for SVHN dataset. https://github.com/mfarhadi/AHCNN. 
    more » « less
  5. In this paper, we propose a new dynamic reliability technique using an accuracy-reconfigurable stochastic computing (ARSC) framework for deep learning computing. Unlike the conventional stochastic computing that conducts design time accuracy power/energy trade-off, the new ARSC design can adjust the bit-width of the data in run time. Hence, the ARSC can mitigate the long-term aging effects by slowing the system clock frequency, while maintaining the inference throughput by reducing the data bit-width at a small cost of accuracy. We show how to implement the recently proposed counter-based SC multiplication and bit-width reduction on a layer-wise quantization scheme for CNN networks with dynamic fixed-point data. We validate an ARSC-based five-layer convolutional neural network designs for the MNIST dataset based on Vivado HLS with constraints from Xilinx Zynq-7000 family xc7z045 platform. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the pre-aging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to the frequency scaling, which can further improve system reliability due to the reduced temperature. 
    more » « less