skip to main content


Title: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, whereas small initialization leads to so called “rich regimes”. However, the initialization structure is richer than the overall scale alone and involves relative magnitudes of different weights and layers in the network. Here we show that these relative scales, which we refer to as initialization shape, play an important role in determining the learned model. We develop a novel technique for deriving the inductive bias of gradientflow and use it to obtain closed-form implicit regularizers for multiple cases of interest.  more » « less
Award ID(s):
1764032
NSF-PAR ID:
10286846
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
139
ISSN:
2640-3498
Page Range / eLocation ID:
468 - 477
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent years have witnessed significant progress in understanding the relationship between the connectivity of a deep network's architecture as a graph, and the network's performance. A few prior arts connected deep architectures to expander graphs or Ramanujan graphs, and particularly,[7] demonstrated the use of such graph connectivity measures with ranking and relative performance of various obtained sparse sub-networks (i.e. models with prune masks) without the need for training. However, no prior work explicitly explores the role of parameters in the graph's connectivity, making the graph-based understanding of prune masks and the magnitude/gradient-based pruning practice isolated from one another. This paper strives to fill in this gap, by analyzing the Weighted Spectral Gap of Ramanujan structures in sparse neural networks and investigates its correlation with final performance. We specifically examine the evolution of sparse structures under a popular dynamic sparse-to-sparse network training scheme, and intriguingly find that the generated random topologies inherently maximize Ramanujan graphs. We also identify a strong correlation between masks, performance, and the weighted spectral gap. Leveraging this observation, we propose to construct a new "full-spectrum coordinate'' aiming to comprehensively characterize a sparse neural network's promise. Concretely, it consists of the classical Ramanujan's gap (structure), our proposed weighted spectral gap (parameters), and the constituent nested regular graphs within. In this new coordinate system, a sparse subnetwork's L2-distance from its original initialization is found to have nearly linear correlated with its performance. Eventually, we apply this unified perspective to develop a new actionable pruning method, by sampling sparse masks to maximize the L2-coordinate distance. Our method can be augmented with the "pruning at initialization" (PaI) method, and significantly outperforms existing PaI methods. With only a few iterations of training (e.g 500 iterations), we can get LTH-comparable performance as that yielded via "pruning after training", significantly saving pre-training costs. Codes can be found at: https://github.com/VITA-Group/FullSpectrum-PAI. 
    more » « less
  2. Abstract Aim

    While we understand broad climate drivers of insect distributions throughout the Arctic, less is known about the role of spatial processes in determining these relationships. As such, there is a need to understand how spatial controls may influence our interpretations of chironomid environment relationships. Here, we evaluated whether the distribution of chironomids followed spatial gradients, or were primarily controlled by environmental factors.

    Location

    Eastern Canadian Arctic, Greenland, Iceland.

    Taxon

    Non‐biting midges (Chironomidae).

    Methods

    We examined chironomid assemblages from 239 lakes in the western North Atlantic Arctic region (specifically from the Arctic Archipelago of Canada, two parts of west Greenland (the southwest and central west) and northwest Iceland). We used a combination of unconstrained ordination (Self Organizing Maps); a simple method with only one data matrix (community data), and constrained ordination (Redundancy Analysis); a canonical ordination with two datasets where we extracted structure of community related to environmental data. These methods allowed us to model chironomid assemblages across a large bioregional dimension and identify specific differences between regions that were defined by common taxa represented across all regions in high frequencies, as well as rare taxa distinctive to each region found in low frequencies. We then evaluated the relative importance of spatial processes versus local environmental factors.

    Results

    We find that environmental controls explained the largest amount of variation in chironomid assemblages within each region, and that spatial controls are only significant when crossing between regions. Broad‐scale biogeographic effects on chironomid distributions are reflected by the distinct differences among chironomid assemblages of Iceland, central‐west Greenland, and eastern Canada, defined by the presence of certain common and low‐frequency, rare taxa for each region. Environmental gradients, especially temperature, defined species distributions within each region, whereas spatial processes combine with environmental gradients in determining what mix of species characterizes each broad and geographically distinct island region in our study.

    Main conclusions

    While biogeographic context is important for defining interpretations of environmental controls on species distributions, the primary control on distributions within regions is environmental. These influences are fundamentally important for reconstructing past environmental change and better understanding historical distributions of these insect indicators.

     
    more » « less
  3. Abstract

    Many animal–environment interactions are mediated by the physical forms of the environment, especially in tropical forests, where habitats are structurally complex and highly diverse. Higher structural complexity, measured as habitat surface area, may provide increased resource availability for animals, leading to higher animal diversity. Greater habitat surface area supports increased animal diversity in other systems, such as coral reefs and forest canopies, but it is uncertain how this relationship translates to communities of highly mobile, terrestrial mammal species inhabiting forest floors. We tested the relative importance of forest floor habitat structure, encompassing vegetation and topographic structure, in determining species occupancy and functional diversity of medium to large mammals using data from a tropical forest in the Udzungwa Mountains of Tanzania. We related species occupancies and diversity obtained from a multispecies occupancy model with ground‐level habitat structure measurements obtained from a novel head‐mounted active remote sensing device, the Microsoft HoloLens. We found that habitat surface area was a significant predictor of mean species occupancy and had a significant positive relationship with functional dispersion. The positive relationships indicate that surface area of tropical forest floors may play an important role in promoting mammal occupancy and functional diversity at the microhabitat scale. In particular, habitat surface area had higher mean effects on occupancy for carnivorous and social species. These results support a habitat surface area–diversity relationship on tropical forest floors for mammals.

     
    more » « less
  4. Abstract

    Research on animal microbiomes is increasingly aimed at determining the evolutionary and ecological factors that govern host–microbiome dynamics, which are invariably intertwined and potentially synergistic. We present three empirical studies related to this topic, each of which relies on the diversity of Malagasy lemurs (representing a total of 19 species) and the comparative approach applied across scales of analysis. In Study 1, we compare gut microbial membership across 14 species in the wild to test the relative importance of host phylogeny and feeding strategy in mediating microbiome structure. Whereas host phylogeny strongly predicted community composition, the same feeding strategies shared by distant relatives did not produce convergent microbial consortia, but rather shaped microbiomes in host lineage‐specific ways, particularly in folivores. In Study 2, we compare 14 species of wild and captive folivores, frugivores, and omnivores, to highlight the importance of captive populations for advancing gut microbiome research. We show that the perturbational effect of captivity is mediated by host feeding strategy and can be mitigated, in part, by modified animal management. In Study 3, we examine various scent‐gland microbiomes across three species in the wild or captivity and show them to vary by host species, sex, body site, and a proxy of social status. These rare data provide support for the bacterial fermentation hypothesis in olfactory signal production and implicate steroid hormones as mediators of microbial community structure. We conclude by discussing the role of scale in comparative microbial studies, the links between feeding strategy and host–microbiome coadaptation, the underappreciated benefits of captive populations for advancing conservation research, and the need to consider the entirety of an animal's microbiota. Ultimately, these studies will help move the field from exploratory to hypothesis‐driven research.

     
    more » « less
  5. Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2 . 
    more » « less