NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

https://doi.org/10.1145/3593013.3593988

Meyer, Anna P.; Albarghouthi, Aws; D'Antoni, Loris (June 2023, ACM)

Full Text Available
EFFICIENT DISCRETE MULTI-MARGINAL OPTIMAL TRANSPORT REGULARIZATION

Mehta, Ronak; Kline, Jeffery; Lokhande, Vishnu Suresh; Fung, Glenn; Singh, Vikas (February 2023, OpenReview.net)

Optimal transport has emerged as a powerful tool for a variety of problems in machine learning, and it is frequently used to enforce distributional constraints. In this context, existing methods often use either a Wasserstein metric, or else they apply concurrent barycenter approaches when more than two distributions are considered. In this paper, we leverage multi-marginal optimal transport (MMOT), where we take advantage of a procedure that computes a generalized earth mover’s distance as a sub-routine. We show that not only is our algorithm computationally more efficient compared to other barycentric-based distance methods, but it has the additional advantage that gradients used for backpropagation can be efficiently computed during the forward pass computation itself, which leads to substantially faster model training. We provide technical details about this new regularization term and its properties, and we present experimental demonstrations of faster runtimes when compared to standard Wasserstein-style methods. Finally, on a range of experiments designed to assess effectiveness at enforcing fairness, we demonstrate our method compares well with alternatives.
more » « less
Full Text Available
BagFlip: A Certified Defense Against Data Poisoning

Zhang, Yuhao; Albarghouthi, Aws; D'Antoni, Loris (October 2022, OpenReview.net)

Machine learning models are vulnerable to data-poisoning attacks, in which an attacker maliciously modifies the training set to change the prediction of a learned model. In a trigger-less attack, the attacker can modify the training set but not the test inputs, while in a backdoor attack the attacker can also modify test inputs. Existing model-agnostic defense approaches either cannot handle backdoor attacks or do not provide effective certificates (i.e., a proof of a defense). We present BagFlip, a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attacks. We evaluate BagFlip on image classification and malware detection datasets. BagFlip is equal to or more effective than the state-of-the-art approaches for trigger-less attacks and more effective than the state-of-the-art approaches for backdoor attacks.
more » « less
Full Text Available
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

https://doi.org/2203.15234

Lokhande, Vishnu S.; Chakraborty, Rudrasis; Ravi, Sathya N.; Singh, Vikas (January 2022, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Deep Unlearning via Randomized Conditionally Independent Hessians

Mehta, Ronak R.; Pal, Sourav; Singh, Vikas; Ravi, Sathya N. (January 2022, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available
Certifying Robustness to Programmable Data Bias in Decision Trees

Anna Meyer, Aws Albarghouthi (January 2021, Advances in Neural Information Processing Systems)

Datasets can be biased due to societal inequities, human biases, under-representation of minorities, etc. Our goal is to certify that models produced by a learning algorithm are pointwise-robust to dataset biases. This is a challenging problem: it entails learning models for a large, or even infinite, number of datasets, ensuring that they all produce the same prediction. We focus on decision-tree learning due to the interpretable nature of the models. Our approach allows programmatically specifying \emph{bias models} across a variety of dimensions (e.g., label-flipping or missing data), composing types of bias, and targeting bias towards a specific group. To certify robustness, we use a novel symbolic technique to evaluate a decision-tree learner on a large, or infinite, number of datasets, certifying that each and every dataset produces the same prediction for a specific test point. We evaluate our approach on datasets that are commonly used in the fairness literature, and demonstrate our approach's viability on a range of bias models.
more » « less
Full Text Available
Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs

Meng, Zihang; Mukherjee, Lopamudra; Wu, Yichao; Singh, Vikas; Ravi, Sathya (January 2021, Advances in Neural Information Processing Systems 34 (NeurIPS 2021))

Full Text Available
Learning Invariant Representations using Inverse Contrastive Loss

Kumar Akash, Aditya; Suresh Lokhande, Vishnu; Ravi, Sathya N.; Singh, Vikas (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach is given by the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimization purposes since these losses are agnostic of the metric structure of the parameters of the model. In our paper, we introduce a class of losses for learning representations that are invariant to some extraneous variable of interest by inverting the class of contrastive losses, i.e., inverse contrastive loss (ICL). We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence. More generally, we also show that if we are provided a metric on the sample space, our formulation of ICL can be decomposed into a sum of convex functions of the given distance metric. Our experimental results indicate that models obtained by optimizing ICL achieve significantly better invariance to the extraneous variable for a fixed desired level of accuracy. In a variety of experimental settings, we show applicability of ICL for learning invariant representations for both continuous and discrete protected/extraneous variables. The project page with code is available at https://github.com/adityakumarakash/ICL
more » « less
Full Text Available
Proving data-poisoning robustness in decision trees

https://doi.org/10.1145/3385412.3385975

Drews, Samuel; Albarghouthi, Aws; D'Antoni, Loris (June 2020, Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation)

Full Text Available
Robustness to Programmable String Transformations via Augmented Abstract Training

Zhang, Yuhao; Albarghouthi, Aws; D'Antoni, Loris (January 2020, Thirty-seventh International Conference on Machine Learning)

Full Text Available

« Prev Next »

Search for: All records