Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances. These papers treat the ML algorithm as a blackbox, and redesign online algorithms to take advantage of ML predictions. In this paper, we ask the complementary question: can we redesign ML algorithms to provide better predictions for online algorithms? We explore this question in the context of the classic rentorbuy problem, and show that incorporating optimization benchmarks directly in ML loss functions leads to significantly better performance, while maintaining a worstcase adversarial result when the advice is completely wrong. We support this finding both through theoretical bounds and numerical simulations, and posit that “learning for optimization” is a fertile area for future research.Free, publiclyaccessible full text available July 1, 2023

In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixupoptimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leadsmore »Free, publiclyaccessible full text available July 1, 2023

Meanfield limit has been successfully applied to neural networks, leading to many results in optimizing overparametrized networks. However, existing works often focus on twolayer networks and/or require large number of neurons. We give a new framework for extending the meanfield limit to multilayer network, and show that a polynomialsize threelayer network in our framework can learn the function constructed by Safran et al. (2019) – which is known to be not approximable by any twolayer networks

In this paper we study the training dynamics for gradient flow on overparametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components. Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy lowrank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of overparametrized models for lowrank tensors.