NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

https://doi.org/10.1109/CVPR52688.2022.01333

Somepalli, Gowthami; Fowl, Liam; Bansal, Arpit; Yeh-Chiang, Ping; Dar, Yehuda; Baraniuk, Richard; Goldblum, Micah; Goldstein, Tom (June 2022, CVPR)

Full Text Available
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Chen Zhu; Renkun Ni; Zheng Xu; Kezhi Kong; W. Ronny Huang; Tom Goldstein (January 2021, Advances in Neural Information Processing Systems)

Full Text Available
MetaPoison: Practical General-purpose Clean-label Data Poisoning

Huang, Ronny; Geiping, Jonas; Fowl, Liam; Taylor, Gavin; Goldstein, Tom (August 2020, ArXivorg)

Data poisoning--the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data--is an emerging threat in the context of neural networks. Existing attacks for data poisoning have relied on hand-crafted heuristics. Instead, we pose crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponds to updating those poisons to achieve a desired behavior on the trained model. We then propose MetaPoison, a first-order method to solve this optimization quickly. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin under the same setting. MetaPoison is robust: its poisons transfer to a variety of victims with unknown hyperparameters and architectures. MetaPoison is also general-purpose, working not only in fine-tuning scenarios, but also for end-to-end training from scratch with remarkable success, e.g. causing a target image to be misclassified 90% of the time via manipulating just 1% of the dataset. Additionally, MetaPoison can achieve arbitrary adversary goals not previously possible--like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world.
more » « less
Full Text Available
TRUTH OR BACKPROPAGANDA? AN EMPIRICAL INVESTIGATION OF DEEP LEARNING THEORY

Goldblum, Micah; Geiping, Jonas; Schwarzschild, Avi; Moeller, Michael; Goldstein, Tom (June 2020, International Conference on Learning Representations)

We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.
more » « less
Full Text Available
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

Sankararaman, Karthik A; De, Soham; Xu, Zheng; Huang, Ronny; Goldstein, Tom (March 2020, International Conference on Machine Learning)

This paper studies how neural network architecture affects the speed of training. We introduce a simple concept called gradient confusion to help formally analyze this. When gradient confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, data samples interact harmoniously, and training proceeds quickly. Through theoretical and experimental results, we demonstrate how the neural network architecture affects gradient confusion, and thus the efficiency of training. Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training. On the other hand, increasing the depth of neural networks has the opposite effect. Our results indicate that alternate initialization techniques or networks using both batch normalization and skip connections help reduce the training burden of very deep networks.
more » « less
Full Text Available
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness

Kumar, Aounon; Levine, Alexander; Goldstein, Tom; Feizi, Soheil (February 2020, International Conference on Machine Learning)

Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d.
more » « less
Full Text Available
MSE-Optimal Neural Network Initialization via Layer Fusion

Ghods, Ramina; Lan, Andrew; Goldstein, Tom; Studer, Christoph (January 2020, ArXivorg)

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation have been proposed in the past. In this paper, we propose FuseInit, a novel method to initialize shallower networks by fusing neighboring layers of deeper networks that are trained with random initialization. We develop theoretical results and efficient algorithms for mean-square error (MSE)-optimal fusion of neighboring dense-dense, convolutional-dense, and convolutional-convolutional layers. We show experiments for a range of classification and regression datasets, which suggest that deeper neural networks are less sensitive to initialization and shallower networks can perform better (sometimes as well as their deeper counterparts) if initialized with FuseInit.
more » « less
Full Text Available
BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES

Ghiasi, Amin; Shafahi, Ali; Goldstein, Tom (September 2019, International Conference on Learning Representations)

Defenses against adversarial attacks can be classified into certified and non-certified. Certifiable defenses make networks robust within a certain -bounded radius, so that it is impossible for the adversary to make adversarial examples in the certificate bound. We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius. Furthermore, the proposed "Shadow Attack" can fool certifiably robust networks by producing an imperceptible adversarial example that gets misclassified and produces a strong ``spoofed'' certificate.
more » « less
Full Text Available

Search for: All records