We study the problem of highdimensional robust mean estimation in the presence of a constant fraction of adversarial outliers. A recent line of work has provided sophisticated polynomialtime algorithms for this problem with dimensionindependent error guarantees for a range of natural distribution families. In this work, we show that a natural nonconvex formulation of the problem can be solved directly by gradient descent. Our approach leverages a novel structural lemma, roughly showing that any approximate stationary point of our nonconvex objective gives a nearoptimal solution to the underlying robust estimation task. Our work establishes an intriguing connection between algorithmic highdimensional robust statistics and nonconvex optimization, which may have broader applications to other robust estimation tasks.
Inverting Deep Generative models, One layer at a time.
We study the problem of inverting a deep generative model with ReLU activations.
Inversion corresponds to finding a latent code vector that explains observed measurements as much as possible. In most prior works this is performed by attempting
to solve a nonconvex optimization problem involving the generator. In this paper
we obtain several novel theoretical results for the inversion problem.
We show that for the realizable case, single layer inversion can be performed
exactly in polynomial time, by solving a linear program. Further, we show that for
multiple layers, inversion is NPhard and the preimage set can be nonconvex.
For generative models of arbitrary depth, we show that exact recovery is possible
in polynomial time with high probability, if the layers are expanding and the
weights are randomly selected. Very recent work analyzed the same problem for
gradient descent inversion. Their analysis requires significantly higher expansion
(logarithmic in the latent dimension) while our proposed algorithm can provably
reconstruct even with constant factor expansion. We also provide provable error
bounds for different norms for reconstructing noisy observations. Our empirical
validation demonstrates that we obtain better reconstructions when the latent
dimension is large.
 Award ID(s):
 1901281
 Publication Date:
 NSFPAR ID:
 10176162
 Journal Name:
 Advances in neural information processing systems
 Volume:
 32
 ISSN:
 10495258
 Sponsoring Org:
 National Science Foundation
More Like this


We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semiinfinite duality to obtain equivalent convex optimization problems for several two and threelayer CNN architectures. We first prove that twolayer CNNs can be globally optimized via an `2 norm regularized convex program. We then show that multilayer circular CNN training problems with a single ReLU layer are equivalent to an `1 regularized convex program that encourages sparsity in the spectral domain. We also extend these results to threelayer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.

This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in topdown architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized nonpersistent short run MCMC, such as nite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the KullbackLeibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational autoencoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model.

We give the first statisticalquery lower bounds for agnostically learning any nonpolynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statisticalquery algorithm with tolerance n−(1/ϵ)b must use at least 2ncϵ queries for some constant b,c>0, where n is the dimension and ϵ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for realvalued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by twolayer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a bestpossible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.

Motivated by practical concerns in applying information design to markets and service systems, we consider a persuasion problem between a sender and a receiver where the receiver may not be an expected utility maximizer. In particular, the receiver’s utility may be nonlinear in her belief; we deem such receivers as riskconscious. Such utility models arise, for example, when the receiver exhibits sensitivity to the variability and the risk in the payoff on choosing an action (e.g., waiting time for a service). In the presence of such nonlinearity, the standard approach of using revelationprinciple style arguments fails to characterize the set of signals needed in the optimal signaling scheme. Our main contribution is to provide a theoretical framework, using results from convex analysis, to overcome this technical challenge. In particular, in general persuasion settings with riskconscious agents, we prove that the sender’s problem can be reduced to a convex optimization program. Furthermore, using this characterization, we obtain a bound on the number of signals needed in the optimal signaling scheme. We apply our methods to study a specific setting, namely binary persuasion, where the receiver has two possible actions (0 and 1), and the sender always prefers the receiver taking actionmore »