Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment.
more »
« less
Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes
We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attributes to classes for this setting, even with perfect attribute detectors. The lower bound characterizes the theoretical intrinsic difficulty of the zero-shot problem based on the available information---the class-attribute matrix---and the bound is practically computable from it. Our lower bound is tight, as we show that we can always find a randomized map from attributes to classes whose expected error is upper bounded by the value of the lower bound. We show that our analysis can be predictive of how standard zero-shot methods behave in practice, including which classes will likely be confused with others.
more »
« less
- Award ID(s):
- 1813444
- PAR ID:
- 10451730
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recognizing the attributes of objects and their parts is central to many computer vision applications. Although great progress has been made to apply object-level recognition, recognizing the attributes of parts remains less applicable since the training data for part attributes recognition is usually scarce especially for internet-scale applications. Furthermore, most existing part attribute recognition methods rely on the part annotations which are more expensive to obtain. In order to solve the data insufficiency problem and get rid of dependence on the part annotation, we introduce a novel Concept Sharing Network (CSN) for part attribute recognition. A great advantage of CSN is its capability of recognizing the part attribute (a combination of part location and appearance pattern) that has insufficient or zero training data, by learning the part location and appearance pattern respectively from the training data that usually mix them in a single label. Extensive experiments on CUB, Celeb A, and a newly proposed human attribute dataset demonstrate the effectiveness of CSN and its advantages over other methods, especially for the attributes with few training samples. Further experiments show that CSN can also perform zero-shot part attribute recognition.more » « less
-
Machine learning models—including prominent zero-shot models—are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes—or, in the case of zero-shot prediction, to improve its performance—without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping arg max with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.more » « less
-
Large-scale object detection and instance segmentation face a severe data imbalance. The finer-grained object classes become, the less frequent they appear in our datasets. However, at test-time, we expect a detector that performs well for all classes and not just the most frequent ones. In this paper, we provide a theoretical understanding of the long-trail detection problem. We show how the commonly used mean average precision evaluation metric on an unknown test set is bound by a margin-based binary classification error on a long-tailed object detection training set. We optimize margin-based binary classification error with a novel surrogate objective called \textbf{Effective Class-Margin Loss} (ECM). The ECM loss is simple, theoretically well-motivated, and outperforms other heuristic counterparts on LVIS v1 benchmark over a wide range of architecture and detectors.more » « less
-
Raz, Ran (Ed.)We give upper and lower bounds on the power of subsystems of the Ideal Proof System (IPS), the algebraic proof system recently proposed by Grochow and Pitassi, where the circuits comprising the proof come from various restricted algebraic circuit classes. This mimics an established research direction in the boolean setting for subsystems of Extended Frege proofs whose lines are circuits from restricted boolean circuit classes. Essentially all of the subsystems considered in this paper can simulate the well-studied Nullstellensatz proof system, and prior to this work there were no known lower bounds when measuring proof size by the algebraic complexity of the polynomials (except with respect to degree, or to sparsity). Our main contributions are two general methods of converting certain algebraic lower bounds into proof complexity ones. Both require stronger arithmetic lower bounds than common, which should hold not for a specific polynomial but for a whole family defined by it. These may be likened to some of the methods by which Boolean circuit lower bounds are turned into related proof-complexity ones, especially the "feasible interpolation" technique. We establish algebraic lower bounds of these forms for several explicit polynomials, against a variety of classes, and infer the relevant proof complexity bounds. These yield separations between IPS subsystems, which we complement by simulations to create a partial structure theory for IPS systems. Our first method is a functional lower bound, a notion of Grigoriev and Razborov, which is a function f' from n-bit strings to a field, such that any polynomial f agreeing with f' on the boolean cube requires large algebraic circuit complexity. We develop functional lower bounds for a variety of circuit classes (sparse polynomials, depth-3 powering formulas, read-once algebraic branching programs and multilinear formulas) where f'(x) equals 1/p(x) for a constant-degree polynomial p depending on the relevant circuit class. We believe these lower bounds are of independent interest in algebraic complexity, and show that they also imply lower bounds for the size of the corresponding IPS refutations for proving that the relevant polynomial p is non-zero over the boolean cube. In particular, we show super-polynomial lower bounds for refuting variants of the subset-sum axioms in these IPS subsystems. Our second method is to give lower bounds for multiples, that is, to give explicit polynomials whose all (non-zero) multiples require large algebraic circuit complexity. By extending known techniques, we give lower bounds for multiples for various restricted circuit classes such sparse polynomials, sums of powers of low-degree polynomials, and roABPs. These results are of independent interest, as we argue that lower bounds for multiples is the correct notion for instantiating the algebraic hardness versus randomness paradigm of Kabanets and Impagliazzo. Further, we show how such lower bounds for multiples extend to lower bounds for refutations in the corresponding IPS subsystem.more » « less