Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhance robustness. For this purpose, we explore the feasibility of probing a large language model for geography-based object knowledge, and we examine the effects of integrating knowledge into zero-shot and learnable soft prompting with CLIP. Within this exploration, we propose geography knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an unseen target set. Accuracy gains over prompting baselines on DollarStreet while training only on Europe data are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes. Competitive performance is shown vs. few-shot target training, and analysis is provided to direct future study of geographical robustness.more » « lessFree, publicly-accessible full text available June 17, 2025
-
Free, publicly-accessible full text available January 4, 2025
-
State-of-the-art object recognition methods do not generalize well to unseen domains. Work in domain generalization has attempted to bridge domains by increasing feature compatibility, but has focused on standard, appearance-based representations. We show the potential of shape-based representations to increase domain robustness. We compare two types of shape-based representations: one trains a convolutional network over edge features, and another computes a soft, dense medial axis transform. We show the complementary strengths of these representations for different types of domains, and the effect of the amount of texture that is preserved. We show that our shape-based techniques better leverage data augmentations for domain generalization, and are more effective at texture bias mitigation than shape-inducing augmentations. Finally, we show that when the convolutional network in state-of-the-art domain generalization methods is replaced with one that explicitly captures shape, we obtain improved results.more » « less