skip to main content


Title: Knowledge distillation circumvents nonlinearity for optical convolutional neural networks

In recent years, convolutional neural networks (CNNs) have enabled ubiquitous image processing applications. As such, CNNs require fast forward propagation runtime to process high-resolution visual streams in real time. This is still a challenging task even with state-of-the-art graphics and tensor processing units. The bottleneck in computational efficiency primarily occurs in the convolutional layers. Performing convolutions in the Fourier domain is a promising way to accelerate forward propagation since it transforms convolutions into elementwise multiplications, which are considerably faster to compute for large kernels. Furthermore, such computation could be implemented using an optical4fsystem with orders of magnitude faster operation. However, a major challenge in using this spectral approach, as well as in an optical implementation of CNNs, is the inclusion of a nonlinearity between each convolutional layer, without which CNN performance drops dramatically. Here, we propose a spectral CNN linear counterpart (SCLC) network architecture and its optical implementation. We propose a hybrid platform with an optical front end to perform a large number of linear operations, followed by an electronic back end. The key contribution is to develop a knowledge distillation (KD) approach to circumvent the need for nonlinear layers between the convolutional layers and successfully train such networks. While the KD approach is known in machine learning as an effective process for network pruning, we adapt the approach to transfer the knowledge from a nonlinear network (teacher) to a linear counterpart (student), where we can exploit the inherent parallelism of light. We show that the KD approach can achieve performance that easily surpasses the standard linear version of a CNN and could approach the performance of the nonlinear network. Our simulations show that the possibility of increasing the resolution of the input image allows our proposed4foptical linear network to perform more efficiently than a nonlinear network with the same accuracy on two fundamental image processing tasks: (i) object classification and (ii) semantic segmentation.

 
more » « less
PAR ID:
10531250
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Optical Society of America
Date Published:
Journal Name:
Applied Optics
Volume:
61
Issue:
9
ISSN:
1559-128X; APOPAI
Format(s):
Medium: X Size: Article No. 2173
Size(s):
Article No. 2173
Sponsoring Org:
National Science Foundation
More Like this
  1. The discrete Fourier transform (DFT) is of fundamental interest in photonic quantum information, yet the ability to scale it to high dimensions depends heavily on the physical encoding, with practical recipes lacking in emerging platforms such as frequency bins. In this article, we show thatd-point frequency-bin DFTs can be realized with a fixed three-component quantum frequency processor (QFP), simply by adding to the electro-optic modulation signals one radio-frequency harmonic per each incremental increase ind. We verify gate fidelityFW>0.9997and success probabilityPW>0.965up tod = 10 in numerical simulations, and experimentally implement the solution ford = 3, utilizing measurements with parallel DFTs to quantify entanglement and perform tomography of multiple two-photon frequency-bin states. Our results furnish new opportunities for high-dimensional frequency-bin protocols in quantum communications and networking.

     
    more » « less
  2. We call a surface that appears undistorted when viewed in a curved mirror aneigensurfaceand the mirror aneigenmirror. Such pairs are described by a first-order nonlinear partial differential equation of the forma0+a1ux+a2uy+a3uxuy+a4ux2+a5uy2=0, whereai=ai(x,y,u), which we call theanti-eikonal equation. We give examples of symbolic and numerical solutions, including pairs that are geometrically congruent. Ray tracing simulations are included that visually confirm the unusual properties of these surfaces.

     
    more » « less
  3. In a dynamic far-field diffraction experiment, we calculate the largest Lyapunov exponent of a time series obtained from the optical fluctuations in a dynamic diffraction pattern. The time series is used to characterize the locomotory predictability of an oversampled microscopic species. We use a live nematode,Caenorhabditis elegans, as a model organism to demonstrate our method. The time series is derived from the intensity at one point in the diffraction pattern. This single time series displays chaotic markers in the locomotion of theCaenorhabditis elegansby reconstructing the multidimensional phase space. The average largest Lyapunov exponent (base e) associated with the dynamic diffraction of 10 adult wildtype (N2)Caenorhabditis elegansis1.27±<#comment/>0.03s−<#comment/>1.

     
    more » « less
  4. Images captured from a long distance suffer from dynamic image distortion due to turbulent flow of air cells with random temperatures, and thus refractive indices. This phenomenon, known as image dancing, is commonly characterized by its refractive-index structure constantCn2as a measure of the turbulence strength. For many applications such as atmospheric forecast model, long-range/astronomy imaging, and aviation safety, optical communication technology,Cn2estimation is critical for accurately sensing the turbulent environment. Previous methods forCn2estimation include estimation from meteorological data (temperature, relative humidity, wind shear, etc.) for single-point measurements, two-ended pathlength measurements from optical scintillometer for path-averagedCn2, and more recently estimatingCn2from passive video cameras for low cost and hardware complexity. In this paper, we present a comparative analysis of classical image gradient methods forCn2estimation and modern deep learning-based methods leveraging convolutional neural networks. To enable this, we collect a dataset of video capture along with reference scintillometer measurements for ground truth, and we release this unique dataset to the scientific community. We observe that deep learning methods can achieve higher accuracy when trained on similar data, but suffer from generalization errors to other, unseen imagery as compared to classical methods. To overcome this trade-off, we present a novel physics-based network architecture that combines learned convolutional layers with a differentiable image gradient method that maintains high accuracy while being generalizable across image datasets.

     
    more » « less
  5. A spatial channel network (SCN) was recently proposed toward the forthcoming spatial division multiplexing (SDM) era, in which the optical layer is explicitly evolved to the hierarchical SDM and wavelength division multiplexing layers, and an optical node is decoupled into a spatial cross-connect (SXC) and wavelength cross-connect to achieve an ultrahigh-capacity optical network in a highly economical manner. In this paper, we report feasibility demonstrations of an evolution scenario regarding the SCN architecture to enhance the flexibility and functionality of spatial channel networking from a simplefixed-core-accessanddirectionalspatial channel ring network to a multidegree,any-core-access,nondirectional, andcore-contentionlessmesh SCN. As key building blocks of SXCs, we introduce what we believe to be novel optical devices: a1×<#comment/>2multicore fiber (MCF) splitter, a core selector (CS), and a core and port selector (CPS). We construct free-space optics-based prototypes of these devices using five-core MCFs. Detailed performance evaluations of the prototypes in terms of the insertion loss (IL), polarization-dependent loss (PDL), and intercore cross talk (XT) are conducted. The results show that the prototypes provide satisfactorily low levels of IL, PDL, and XT. We construct a wide variety of reconfigurable spatial add/drop multiplexers (RSADMs) and SXCs in terms of node degree, interport cross-connection architecture, and add/drop port connectivity flexibilities. Such RSADMs/SXCs include a fixed-core-access and directional RSADM using a1×<#comment/>2MCF splitter; an any-core-access, nondirectional SXC with core-contention using a CS; and an any-core-access, nondirectional SXC without core-contention using a CPS. Bit error rate performance measurements for SDM signals that traverse the RSADMs/SXCs confirm that there is no or a very slight optical signal-to-noise-ratio penalty from back-to-back performance. We also experimentally show that the flexibilities in the add/drop port of the SXCs allow us to recover from a single or concurrent double link failure with a wide variety of options in terms of availability and cost-effectiveness.

     
    more » « less