NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Distributionally Robust Optimization with Bias and Variance Reduction

Mehta, Ronak; Roulet, Vincent; Pillutla, Krishna; Harchaoui, Zaid (May 2024, OpenReview)
OpenReview (Ed.)
We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and f-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-k loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3× faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
more » « less
Full Text Available
Influence Diagnostics under Self-concordance

Fisher, Jillian; Liu, Lang; Pillutla, Krishna; Choi, Yejin; Harchaoui, Zaid (April 2023, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics)
Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.
more » « less
Full Text Available
Stochastic Optimization for Spectral Risk Measures

Mehta, Ronak; Roulet, Vincent; Pillutla, Krishna; Liu, Lang; Harchaoui, Zaid (April 2023, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics)
Ruiz, Francisco; Dy, Jennifer; an de Meent, Jan-Willem (Ed.)
Spectral risk objectives – also called L-risks – allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop LSVRG, a stochastic algorithm to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothness of the objective. We show theoretically and experimentally that out-of-the-box approaches such as stochastic subgradient and dual averaging can be hindered by bias, whereas our approach exhibits linear convergence.
more » « less
Full Text Available
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Pillutla, Krishna; Swayamdipta, Swabha; Zellers, Rowan; Thickstun, John; Welleck, Sean; hoi, Yejin; Harchaoui, Zaid (January 2022, Advances in neural information processing systems)

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
more » « less
Full Text Available
A Superquantile Approach to Federated Learning with Heterogeneous Devices

https://doi.org/10.1109/CISS50987.2021.9400318

Laguel, Yassine; Pillutla, Krishna; Malick, Jerome; Harchaoui, Zaid (March 2021, IEEE Xplore)

Full Text Available
LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Kusupati, Aditya; Wallingford, Matthew; Ramanujan, Vivek; Somani, Raghav; Park, Jae Sung; Pillutla, Krishna; Jain, Prateek; Kakade, Sham; Farhadi, Ali (January 2021, Advances in neural information processing systems)

Full Text Available

Search for: All records