The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
More Like this
-
Braverman, Mark (Ed.)A longstanding open problem in coding theory is to determine the best (asymptotic) rate R₂(δ) of binary codes with minimum constant (relative) distance δ. An existential lower bound was given by Gilbert and Varshamov in the 1950s. On the impossibility side, in the 1970s McEliece, Rodemich, Rumsey and Welch (MRRW) proved an upper bound by analyzing Delsarte’s linear programs. To date these results remain the best known lower and upper bounds on R₂(δ) with no improvement even for the important class of linear codes. Asymptotically, these bounds differ by an exponential factor in the blocklength. In this work, we introduce a new hierarchy of linear programs (LPs) that converges to the true size A^{Lin}₂(n,d) of an optimum linear binary code (in fact, over any finite field) of a given blocklength n and distance d. This hierarchy has several notable features: 1) It is a natural generalization of the Delsarte LPs used in the first MRRW bound. 2) It is a hierarchy of linear programs rather than semi-definite programs potentially making it more amenable to theoretical analysis. 3) It is complete in the sense that the optimum code size can be retrieved from level O(n²). 4) It provides an answer in the form of a hierarchy (in larger dimensional spaces) to the question of how to cut Delsarte’s LP polytopes to approximate the true size of linear codes. We obtain our hierarchy by generalizing the Krawtchouk polynomials and MacWilliams inequalities to a suitable "higher-order" version taking into account interactions of 𝓁 words. Our method also generalizes to translation schemes under mild assumptions.more » « less