skip to main content


Title: On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, whereas small initialization leads to so called “rich regimes”. However, the initialization structure is richer than the overall scale alone and involves relative magnitudes of different weights and layers in the network. Here we show that these relative scales, which we refer to as initialization shape, play an important role in determining the learned model. We develop a novel technique for deriving the inductive bias of gradientflow and use it to obtain closed-form implicit regularizers for multiple cases of interest.  more » « less
Award ID(s):
1764032
NSF-PAR ID:
10286846
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
139
ISSN:
2640-3498
Page Range / eLocation ID:
468 - 477
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Children’s automatic speech recognition (ASR) is always difficult due to, in part, the data scarcity problem, especially for kindergarten-aged kids. When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential. Recently, meta-learning was proposed to learn model initialization (MI) for ASR tasks of different languages. This method leads to good performance when the model is adapted to an unseen language. How-ever, MI is vulnerable to overfitting on training tasks (learner overfitting). It is also unknown whether MI generalizes to other low-resource tasks. In this paper, we validate the effectiveness of MI in children’s ASR and attempt to alleviate the problem of learner overfitting. To achieve model-agnostic meta-learning (MAML), we regard children’s speech at each age as a different task. In terms of learner overfitting, we propose a task-level augmentation method by simulating new ages using frequency warping techniques. Detailed experiments are conducted to show the impact of task augmentation on each age for kindergarten-aged speech. As a result, our approach achieves a relative word error rate (WER) improvement of 51% over the baseline system with no augmentation or initialization. 
    more » « less
  2. Abstract

    Environmental filtering and dispersal limitation are important processes within the metacommunity concept. Non‐random species turnover occurs in places where environmental filtering plays the key role in determining local community structure, whereas dispersal limitation causes nested patterns of species assemblages organized by non‐random colonization processes. However, factors that modify the relative importance of these processes remain unclear for many ecosystems. We tested whether salinity gradient affect the relative importance of environmental filtering and dispersal limitation for structuring epifaunal and infaunal communities in three lagoons in Hokkaido, Japan, that have different salinity gradients. Specifically, we compared patterns of species diversity and similarity of eelgrass‐associated invertebrate assemblages across space. Beta diversity (i.e., species turnover among different sites in each lagoon) was highest in Akkeshi, the lagoon with the salinity gradients. Variation partitioning of similarity components showed that spatial variation in the community assemblage pattern was mostly explained by environmental filtering in Akkeshi, but that it was explained more by species dispersal patterns and the difference in eelgrass biomass and shoot density in Notoro and Saroma, the lagoons without clear salinity gradient. Redundancy analysis showed that spatial variation in community structure was related to salinity and eelgrass biomass in Akkeshi, and to eelgrass aboveground biomass in Notoro and Saroma. Our findings highlight the effects of environmental heterogeneity on beta diversity and community structure and indicate that environmental gradients can be a key factor causing a shift in the relative importance of different metacommunity processes and the role of the foundation species in provisioning habitat.

     
    more » « less
  3. Abstract

    Many animal–environment interactions are mediated by the physical forms of the environment, especially in tropical forests, where habitats are structurally complex and highly diverse. Higher structural complexity, measured as habitat surface area, may provide increased resource availability for animals, leading to higher animal diversity. Greater habitat surface area supports increased animal diversity in other systems, such as coral reefs and forest canopies, but it is uncertain how this relationship translates to communities of highly mobile, terrestrial mammal species inhabiting forest floors. We tested the relative importance of forest floor habitat structure, encompassing vegetation and topographic structure, in determining species occupancy and functional diversity of medium to large mammals using data from a tropical forest in the Udzungwa Mountains of Tanzania. We related species occupancies and diversity obtained from a multispecies occupancy model with ground‐level habitat structure measurements obtained from a novel head‐mounted active remote sensing device, the Microsoft HoloLens. We found that habitat surface area was a significant predictor of mean species occupancy and had a significant positive relationship with functional dispersion. The positive relationships indicate that surface area of tropical forest floors may play an important role in promoting mammal occupancy and functional diversity at the microhabitat scale. In particular, habitat surface area had higher mean effects on occupancy for carnivorous and social species. These results support a habitat surface area–diversity relationship on tropical forest floors for mammals.

     
    more » « less
  4. Lithium-ion battery cathode slurries have a microstructure that depends sensitively on how they are processed due to carbon black's (CB) evolving structure when subjected coating flows. While polyvinylidene difluoride (PVDF), one of the main components of the cathode slurry, plays an important role in modifying the structure and rheology of CB, a quantitative understanding is lacking. In this work, we explore the role of PVDF in determining the structural evolution of Super C65 CB in N-methyl-2-pyrrolidinone (NMP) with rheo-electric measurements. We find that PVDF enhances the viscosity of NMP resulting in a more extensive structural erosion of CB agglomerates with increasing polymer concentration and molecular weight. We also show that the relative viscosity of all suspensions can be collapsed by the fluid Mason number (Mnf), which compares the hydrodynamic forces imposed by the medium to cohesive forces holding CB agglomerates together. Using simultaneous rheo-electric measurements, we find at high Mnf, the dielectric strength (Δε) scales with Mnf, and the power-law scaling can be quantitatively predicted by considering the self-similar break up of CB agglomerates. The collapse of the relative viscosity and scaling of Δε both suggest that PVDF increases the hydrodynamic force of the suspending medium without directly changing the CB agglomerate structure. These findings are valuable for optimizing the rheology of lithium ion battery cathode slurries. We also anticipate that these findings can be extended to understand the microstructure of similar systems under flow.

     
    more » « less
  5. Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2 . 
    more » « less