NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SCoRe: Submodular Combinatorial Representation Learning for Real-World Class-Imbalanced Settings

Majee, Anay; Kothawade, Suraj; Killiamsetty, Krishnateja; Iyer, Rishabh (July 2024, International Conference on Machine Learning, ICML 2024)

Full Text Available
Beyond active learning: Leveraging the full potential of human interaction via auto-labeling, human correction, and human verification

Beck, Nathan; Killamsetty, Krishnateja; Kothawade, Suraj; Iyer, Rishabh (January 2024, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision)

Full Text Available
DITTO: Data-efficient and fair targeted subset selection for ASR accent adaptation

Kothawade, Suraj; Mekala, Anmol; Kothyari, Mayank; Iyer, Rishabh; Ramakrishnan, Ganesh; Jyothi, Preethi (June 2023, In Association of Computational Linguists (ACL) 2023)

Full Text Available
PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Li, Changbin; Kothawade, Suraj; Chen, Feng; Iyer, Rishabh (September 2022, Proceedings of Machine Learning Research)

Full Text Available
PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Data Subset Selection.

https://doi.org/10.1609/aaai.v36i9.21264

Kothawade, Suraj and (April 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i) targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is underperforming, and ii) guided summarization, where data (e.g., image collection, text, document or video) is summarized for quicker human consumption with specific additional user intent. Motivated by such applications, we present PRISM, a rich class of PaRameterIzed Submodular information Measures. Through novel functions and their parameterizations, PRISM offers a variety of modeling capabilities that enable a trade-off between desired qualities of a subset like diversity or representation and similarity/dissimilarity with a set of data points. We demonstrate how PRISM can be applied to the two real-world problems mentioned above, which require guided subset selection. In doing so, we show that PRISM interestingly generalizes some past work, therein reinforcing its broad utility. Through extensive experiments on diverse datasets, we demonstrate the superiority of PRISM over the state-of-the-art in targeted learning and in guided image-collection summarization.
more » « less
Full Text Available
PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Li, Changbin; Kothawade, Suraj; Chen, Feng; Iyer, Rishabh (July 2022, Proceedings of Machine Learning Research)

Few-shot classification (FSC) requires training models using a few (typically one to five) data points per class. Meta learning has proven to be able to learn a parametrized model for FSC by training on various other classification tasks. In this work, we propose PLATINUM (semi-suPervised modeL Agnostic meTa-learnIng usiNg sUbmodular Mutual information), a novel semi-supervised model agnostic meta-learning framework that uses the submodular mutual information (SMI) functions to boost the performance of FSC. PLATINUM leverages unlabeled data in the inner and outer loop using SMI functions during meta-training and obtains richer meta-learned parameterizations for meta-test. We study the performance of PLATINUM in two scenarios - 1) where the unlabeled data points belong to the same set of classes as the labeled set of a certain episode, and 2) where there exist out-of-distribution classes that do not belong to the labeled set. We evaluate our method on various settings on the miniImageNet, tieredImageNet and Fewshot-CIFAR100 datasets. Our experiments show that PLATINUM outperforms MAML and semi-supervised approaches like pseduo-labeling for semi-supervised FSC, especially for small ratio of labeled examples per class.
more » « less
Full Text Available
PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Li, Changbin; Kothawade, Suraj; Chen, Feng; Iyer, Rishabh K. (July 2022, International Conference on Machine Learning)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesyari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
Few-shot classification (FSC) requires training models using a few (typically one to five) data points per class. Meta-learning has proven to be able to learn a parametrized model for FSC by training on various other classification tasks. In this work, we propose PLATINUM (semi-suPervised modeL Agnostic meTa learnIng usiNg sUbmodular Mutual information ), a novel semi-supervised model agnostic meta learning framework that uses the submodular mutual in- formation (SMI) functions to boost the perfor- mance of FSC. PLATINUM leverages unlabeled data in the inner and outer loop using SMI func- tions during meta-training and obtains richer meta- learned parameterizations. We study the per- formance of PLATINUM in two scenarios - 1) where the unlabeled data points belong to the same set of classes as the labeled set of a cer- tain episode, and 2) where there exist out-of- distribution classes that do not belong to the la- beled set. We evaluate our method on various settings on the miniImageNet, tieredImageNet and CIFAR-FS datasets. Our experiments show that PLATINUM outperforms MAML and semi- supervised approaches like pseduo-labeling for semi-supervised FSC, especially for small ratio of labeled to unlabeled samples.
more » « less
Full Text Available
PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Data Subset Selection

Kothawade, Suraj; Kaushal, Vishal; Ramakrishnan, Ganesh; Bilmes, Jeff; Iyer, Rishabh (February 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i)targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is under performing, and ii)guided summarization, where data (e.g.,image collection, text, document or video) is summarized for quicker human consumption with specific additional user in-tent. Motivated by such applications, we present PRISM, a rich class of PaRameterIzed Submodular information Measures. Through novel functions and their parameterizations, PRISM offers a variety of modeling capabilities that enable a trade-off between desired qualities of a subset like diversity or representation and similarity/dissimilarity with a set of data points. We demonstrate how PRISM can be applied to the two real-world problems mentioned above, which require guided subset selection. In doing so, we show that PRISM interestingly generalizes some past work, therein reinforcing its broad utility. Through extensive experiments on diverse datasets, we demonstrate the superiority of PRISM over the state-of-the-art in targeted learning and in guided image-collection summarization.
more » « less
Full Text Available
Similar: Submodular information measures based active learning in realistic scenarios

Kothawade, Suraj; Beck, Nathan; Killamsetty, Krishnateja; Iyer, Rishabh (December 2021, Advances in neural information processing systems)

Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet. SIMILAR is available as a part of the DISTIL toolkit: "this https URL".
more » « less
Full Text Available

Search for: All records