Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are commonly used for large scale optimization in machine learning. Despite the sustained effort to make IG methods more data-efficient, it remains an open question how to select a training data subset that can theoretically and practically perform on par with the full dataset. Here we develop CRAIG, a method to select a weighted subset (or coreset) of training data that closely estimates the full gradient by maximizing a submodular function. We prove that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization. As a result, CRAIG achieves a speedup that is inversely proportional to the size of the subset. To our knowledge, this is the first rigorous method for data-efficient training of general machine learning models. Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.
more »
« less
Robust Attribution Regularization
An emerging problem in trustworthy machine learning is to train models that pro- duce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our theory is grounded in the recent work, Integrated Gradients (IG) [STY17], in axiomatically attributing a neural network’s output change to its input change. We propose training objectives in classic robust optimization models to achieve robust IG attributions. Our objectives give principled generalizations of previous objectives designed for robust predictions, and they naturally degenerate to classic soft-margin training for one-layer neural networks. We also generalize previous theory and prove that the objectives for different robust optimization models are closely related. Experiments demonstrate the effectiveness of our method, and also point to intriguing problems which hint at the need for better optimization techniques or better neural network architectures for robust attribution training.
more »
« less
- Award ID(s):
- 1804648
- PAR ID:
- 10174876
- Date Published:
- Journal Name:
- Conference on Neural Information Processing Systems
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are commonly used for large scale optimization in machine learning. Despite the sustained effort to make IG methods more data-efficient, it remains an open question how to select a training data subset that can theoretically and practically perform on par with the full dataset. Here we develop CRAIG, a method to select a weighted subset (or coreset) of training data that closely estimates the full gradient by maximizing a submodular function. We prove that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization. As a result, CRAIG achieves a speedup that is inversely proportional to the size of the subset. To our knowledge, this is the first rigorous method for data-efficient training of general machine learning models. Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.more » « less
-
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet–50 under ImageNet).more » « less
-
VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual AugmentationAutomated detection of vulnerabilities in source code is anessential cybersecurity challenge, underpinning trust indigital systems and services. Graph Neural Networks (GNNs)have emerged as a promising approach as they can learn thestructural and logical code relationships in a data-drivenmanner. However, the performance of GNNs is severelylimited by training data imbalances and label noise. GNNscan often learn “spurious” correlations due to superficialcode similarities in the training data, leading todetectors that do not generalize well to unseen real-worlddata. In this work, we propose a new unified framework forrobust and interpretable vulnerability detection—that wecall VISION—to mitigate spurious correlations bysystematically augmenting a counterfactual trainingdataset. Counterfactuals are samples with minimal semanticmodifications that have opposite prediction labels. Ourcomplete framework includes: (i) generating effectivecounterfactuals by prompting a Large Language Model (LLM);(ii) targeted GNN model training on synthetically pairedcode examples with opposite labels; and (iii) graph-basedinterpretability to identify the truly crucial codestatements relevant for vulnerability predictions whileignoring the spurious ones. We find that our frameworkreduces spurious learning and enables more robust andgeneralizable vulnerability detection, as demonstrated byimprovements in overall accuracy (from 51.8% to 97.8%),pairwise contrast accuracy (from 4.5% to 95.8%), andworst-group accuracy increasing (from 0.7% to 85.5%) on thewidely popular Common Weakness Enumeration (CWE)-20vulnerability. We also demonstrate improvements using ourproposed metrics, namely, intra-class attribution variance,inter-class attribution distance, and node scoredependency. We provide a new benchmark for vulnerabilitydetection, CWE-20-CFA, comprising 27,556 samples fromfunctions affected by the high-impact and frequentlyoccurring CWE-20 vulnerability, including both real andcounterfactual examples. Furthermore, our approach enhancessocietal objectives of transparent and trustworthy AI-basedcybersecurity systems through interactive visualization forhuman-in-the-loop analysis.more » « less
-
This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO.more » « less
An official website of the United States government

