skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statistical Mechanics of Deep Learning
The recent striking success of deep neural networks in machine learning raises profound questions about the theoretical principles underlying their success. For example, what can such deep networks compute? How can we train them? How does information propagate through them? Why can they generalize? And how can we teach them to imagine? We review recent work in which methods of physical analysis rooted in statistical mechanics have begun to provide conceptual insights into these questions. These insights yield connections between deep learning and diverse physical and mathematical topics, including random landscapes, spin glasses, jamming, dynamical phase transitions, chaos, Riemannian geometry, random matrix theory, free probability, and nonequilibrium statistical mechanics. Indeed, the fields of statistical mechanics and machine learning have long enjoyed a rich history of strongly coupled interactions, and recent advances at the intersection of statistical mechanics and deep learning suggest these interactions will only deepen going forward.  more » « less
Award ID(s):
1845166
PAR ID:
10291285
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Annual Review of Condensed Matter Physics
Volume:
11
Issue:
1
ISSN:
1947-5454
Page Range / eLocation ID:
501 to 528
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights—so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set. 
    more » « less
  2. While there are several bottlenecks in hybrid organic–inorganic perovskite (HOIP) solar cell production steps, including composition screening, fabrication, material stability, and device performance, machine learning approaches have begun to tackle each of these issues in recent years. Different algorithms have successfully been adopted to solve the unique problems at each step of HOIP development. Specifically, high-throughput experimentation produces vast amount of training data required to effectively implement machine learning methods. Here, we present an overview of machine learning models, including linear regression, neural networks, deep learning, and statistical forecasting. Experimental examples from the literature, where machine learning is applied to HOIP composition screening, thin film fabrication, thin film characterization, and full device testing, are discussed. These paradigms give insights into the future of HOIP solar cell research. As databases expand and computational power improves, increasingly accurate predictions of the HOIP behavior are becoming possible. 
    more » « less
  3. Abstract Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability 1 . Deep-learning accelerators 2–9 aim to perform deep learning energy-efficiently, usually targeting the inference phase and often by exploiting physical substrates beyond conventional electronics. Approaches so far 10–22 have been unable to apply the backpropagation algorithm to train unconventional novel hardware in situ. The advantages of backpropagation have made it the de facto training method for large-scale neural networks, so this deficiency constitutes a major impediment. Here we introduce a hybrid in situ–in silico algorithm, called physics-aware training, that applies backpropagation to train controllable physical systems. Just as deep learning realizes computations with deep neural networks made from layers of mathematical functions, our approach allows us to train deep physical neural networks made from layers of controllable physical systems, even when the physical layers lack any mathematical isomorphism to conventional artificial neural network layers. To demonstrate the universality of our approach, we train diverse physical neural networks based on optics, mechanics and electronics to experimentally perform audio and image classification tasks. Physics-aware training combines the scalability of backpropagation with the automatic mitigation of imperfections and noise achievable with in situ algorithms. Physical neural networks have the potential to perform machine learning faster and more energy-efficiently than conventional electronic processors and, more broadly, can endow physical systems with automatically designed physical functionalities, for example, for robotics 23–26 , materials 27–29 and smart sensors 30–32 . 
    more » « less
  4. This work-in-progress paper presents a joint effort by engineering education and machine learning researchers to develop automated methods for analyzing student responses to challenging conceptual questions in mechanics. These open-ended questions, which emphasize understanding of physical principles rather than calculations, are widely used in large STEM classes to support active learning strategies that have been shown to improve student outcomes. Despite their benefits, written justifications are not commonly used, largely because evaluating them is time-consuming for both instructors and researchers. This study explores the potential of large pre-trained generative sequence-to-sequence language models to streamline the analysis and coding of these student responses. 
    more » « less
  5. With the rise of Artificial Intelligence (AI) systems in society, our children have routine interactions with these technologies. It has become increasingly important for them to understand how these technologies are trained, what their limitations are and how they work. To introduce children to AI and Machine Learning (ML) concepts, recent efforts introduce tools that integrate ML concepts with physical computing and robotics. However, some of these tools cannot be easily integrated into building projects and the high price of robotics kits can be a limiting factor to many schools. We address these limitations by offering a low-cost hardware and software toolkit that we call the Smart Motor to introduce supervised machine learning to elementary school students. Our Smart Motor uses the nearest neighbor algorithm and utilizes visualizations to highlight the underlying decision-making of the model. We conducted a one week long study using Smart Motors with 9- to 12- year old students and measured their learning through observation, questioning and examining what they built. We found that students were able to integrate the Smart Motors into their building projects but some students struggled with understanding how the underlying model functioned. In this paper we discuss these findings and insights for future directions for the Smart Motor. 
    more » « less