Existing gradient-based optimization methods update parameters locally, in a direction that minimizes the loss function. We study a different approach, symmetry teleportation, that allows parameters to travel a large distance on the loss level set, in order to improve the convergence speed in subsequent steps. Teleportation exploits symmetries in the loss landscape of optimization problems. We derive loss-invariant group actions for test functions in optimization and multi-layer neural networks, and prove a necessary condition for teleportation to improve convergence rate. We also show that our algorithm is closely related to second order methods. Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.
more »
« less
On the α-loss Landscape in the Logistic Model
We analyze the optimization landscape of a recently introduced tunable class of loss functions called α-loss, α ∈ (0, ∞], in the logistic model. This family encapsulates the exponential loss (α = 1/2), the log-loss (α = 1), and the 0-1 loss (α = ∞) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging learning methods. Specifically, we study the evolution of the optimization landscape of α-loss with respect to α using tools drawn from the study of strictly-locally-quasi-convex functions in addition to geometric techniques. We interpret these results in terms of optimization complexity via normalized gradient descent.
more »
« less
- Award ID(s):
- 1901243
- PAR ID:
- 10232226
- Date Published:
- Journal Name:
- 2020 IEEE International Symposium on Information Theory
- Page Range / eLocation ID:
- 2700 to 2705
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We consider a problem of guessing, wherein an adversary is interested in knowing the value of the realization of a discrete random variable X on observing another correlated random variable Y. The adversary can make multiple (say, k) guesses. The adversary's guessing strategy is assumed to minimize a-loss, a class of tunable loss functions parameterized by a. It has been shown before that this loss function captures well known loss functions including the exponential loss (a = 1/2), the log-loss (a = 1) and the 0–1 loss (a = ∞). We completely characterize the optimal adversarial strategy and the resulting expected α-loss, thereby recovering known results for a = ∞. We define an information leakage measure from the k-guesses setup and derive a condition under which the leakage is unchanged from a single guess.more » « less
-
We consider the vulnerability of fairness-constrained learning to malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and proved that any proper learner can exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we need only incur a Θ(α) loss in accuracy, where α is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(sqrt(α)) loss, and give a matching Ω(sqrt(α)) lower bound. For Equalized Odds and Predictive Parity, however, and adversary can indeed force an Ω(1) loss. The key technical novelty of our work is how randomization can bypass simple 'tricks' an adversary can use to amplify its power. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.more » « less
-
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with global initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.more » « less
-
null (Ed.)Abstract We study optimal regularity and free boundary for minimizers of an energy functional arising in cohesive zone models for fracture mechanics. Under smoothness assumptions on the boundary conditions and on the fracture energy density, we show that minimizers are $$C^{1, 1/2}$$ C 1 , 1 / 2 , and that near non-degenerate points the fracture set is $$C^{1, \alpha }$$ C 1 , α , for some $$\alpha \in (0, 1)$$ α ∈ ( 0 , 1 ) .more » « less
An official website of the United States government

