skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
Mode connectivity (Garipov et al., 2018; Draxler et al., 2018) is a surprising phenomenon in the loss landscape of deep nets. Optima—at least those discovered by gradient-based optimization—turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets. Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory.  more » « less
Award ID(s):
1704656 1845171
PAR ID:
10161654
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
NeurIPS 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability Hardt et al [2016]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of Hardt et al [2016] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks. 
    more » « less
  2. The "deep image prior" proposed by Ulyanov et al. is an intriguing property of neural nets: a convolutional encoder-decoder network can be used as a prior for natural images. The network architecture implicitly introduces a bias; If we train the model to map white noise to a corrupted image, this bias guides the model to fit the true image before fitting the corrupted regions. This paper explores why the deep image prior helps in denoising natural images. We present a novel method to analyze trajectories generated by the deep image prior optimization and demonstrate: (i) convolution layers of the an encoder-decoder decouple the frequency components of the image, learning each at different rates (ii) the model fits lower frequencies first, making early stopping behave as a low pass filter. The experiments study an extension of Cheng et al which showed that at initialization, the deep image prior is equivalent to a stationary Gaussian process. 
    more » « less
  3. Recently, researchers observed that gradient descent for deep neural networks operates in an “edge-of-stability” (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold 2/\eta (where \eta is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below 2/\eta . While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and 2/\eta . In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the fnal converging point has sharpness close to 2/\eta . Globally we observe that the training dynamics for our example have an interesting bifurcating behavior, which was also observed in the training of neural nets. 
    more » « less
  4. This paper presents a summary of the element test simulations (calibration simulations) submitted by 11 numerical simulation (prediction) teams that participated in the LEAP-2017 prediction exercise. A significant number of monotonic and cyclic triaxial (Vasko, An investigation into the behavior of Ottawa sand through monotonic and cyclic shear tests. Masters Thesis, The George Washington University, 2015; Vasko et al., LEAP-GWU-2015 Laboratory Tests. DesignSafe-CI, Dataset, 2018; El Ghoraiby et al., LEAP 2017: Soil characterization and element tests for Ottawa F65 sand. The George Washington University, Washington, DC, 2017; El Ghoraiby et al., LEAP-2017 GWU Laboratory Tests. DesignSafe-CI, Dataset, 2018; El Ghoraiby et al., Physical and mechanical properties of Ottawa F65 Sand. In B. Kutter et al. (Eds.), Model tests and numerical simulations of liquefaction and lateral spreading: LEAP-UCD-2017. New York: Springer, 2019) and direct simple shear tests (Bastidas, Ottawa F-65 Sand Characterization. PhD Dissertation, University of California, Davis, 2016) are available for Ottawa F-65 sand. The focus of this element test simulation exercise is to assess the performance of the constitutive models used by participating team in simulating the results of undrained stress-controlled cyclic triaxial tests on Ottawa F-65 sand for three different void ratios (El Ghoraiby et al., LEAP 2017: Soil characterization and element tests for Ottawa F65 sand. The George Washington University, Washington, DC, 2017; El Ghoraiby et al., LEAP-2017 GWU Laboratory Tests. DesignSafe-CI, Dataset, 2018; El Ghoraiby et al., Physical and mechanical properties of Ottawa F65 Sand. In B. Kutter et al. (Eds.), Model tests and numerical simulations of liquefaction and lateral spreading: LEAP-UCD-2017. New York: Springer, 2019). The simulated stress paths, stress-strain responses, and liquefaction strength curves show that majority of the models used in this exercise are able to provide a reasonably good match to liquefaction strength curves for the highest void ratio (0.585) but the differences between the simulations and experiments become larger for the lower void ratios (0.542 and 0.515). 
    more » « less
  5. Monazite-(Ce) and xenotime-(Y) occur as secondary minerals in iron-oxide-apatite (IOA) deposits, and their stability and composition are important indicators of timing and conditions of metasomatism. Both of these minerals occur as replacement of apatite and display slight but important variations in light (e.g. La, Ce, Pr, Nd, etc.) and heavy (e.g. Y, Er, Dy, Yb, etc.) REE concentrations [1,2]. The causes for these chemical variations can be quantified by combining thermodynamic modeling with field observations. Major challenges for determining the stability of these minerals in hydrothermal solutions are the underlying models for calculating the thermodynamic properties of REE-bearing mineral solid solutions and aqueous species as a function of temperature and pressure. The thermodynamic properties of monazite and xenotime have been determined using several calorimetric methods [3], but only a few hydrothermal solubility studies have been undertaken, which test the reliability and compatibility of both the calorimetric data and thermodynamic properties of associated REE aqueous species [4,5]. Here, we evaluate the conditions of REE metasomatism in the Pea Ridge IOA-REE deposit in Missouri, and combine newly available experimental solubility data to simulate the speciation of LREE vs. HREE, and the partitioning of REE as a function of varying fluid compositions and temperatures. Our new experimental data will be implemented in the MINES thermodynamic database (http:// tdb.mines.edu) for modeling the chemistry of crustal fluid-rock equilibria [6]. [1] Harlov et al. (2016), Econ. Geol. 111, 1963-1984;[2] Hofstra et al. (2016), Econ. Geol. 111, 1985-2016; [3] Navrotsky et al. (2015), J. Chem. Thermodyn. 88, 126-141; [4] Gysi et al. (2015), Chem. Geol. 83-95; [5] Gysi et al. (2018), Geochim. Cosmochim. Acta 242, 143-164; [6] Gysi (2017), Pure and Appl. Chem. 89, 581-596. 
    more » « less