NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Elusive Images: Beyond Coarse Analysis for Fine-Grained Recognition

https://doi.org/10.1109/WACV57701.2024.00088

Anderson, Connor; Gwilliam, Matt; Gaskin, Evelyn; Farrell, Ryan (January 2024, IEEE)

While the community has seen many advances in recent years to address the challenging problem of Fine-grained Visual Categorization (FGVC), progress seems to be slowing—new state-of-the-art methods often distinguish themselves by improving top-1 accuracy by mere tenths of a percent. However, across all of the now-standard FGVC datasets, there remain sizeable portions of the test data that none of the current state-of-the-art (SOTA) models can successfully predict. This paper provides a framework for identifying and studying the errors that current methods make across diverse fine-grained datasets. Three models of difficulty—Prediction Overlap, Prediction Rank and Pair-wise Class Confusion—are employed to highlight the most challenging sets of images and classes. Extensive experiments apply a range of standard and SOTA methods, evaluating them on multiple FGVC domains and datasets. Insights acquired from coupling these difficulty paradigms with the careful analysis of experimental results suggest crucial areas for future FGVC research, focusing critically on the set of elusive images that none of the current models can correctly classify. Code is available at catalys1.github.io/elusive-images-fgvc.
more » « less
Full Text Available
Improving Fractal Pre-training

https://doi.org/10.1109/WACV51458.2022.00247

Anderson, Connor; Farrell, Ryan (January 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

The deep neural networks used in modern computer vision systems require enormous image datasets to train them. These carefully-curated datasets typically have a million or more images, across a thousand or more distinct categories. The process of creating and curating such a dataset is a monumental undertaking, demanding extensive effort and labelling expense and necessitating careful navigation of technical and social issues such as label accuracy, copyright ownership, and content bias.What if we had a way to harness the power of large image datasets but with few or none of the major issues and concerns currently faced? This paper extends the recent work of Kataoka et al. [15], proposing an improved pre-training dataset based on dynamically-generated fractal images. Challenging issues with large-scale image datasets become points of elegance for fractal pre-training: perfect label accuracy at zero cost; no need to store/transmit large image archives; no privacy/demographic bias/concerns of inappropriate content, as no humans are pictured; limitless supply and diversity of images; and the images are free/open-source. Perhaps surprisingly, avoiding these difficulties imposes only a small penalty in performance. Leveraging a newly-proposed pre-training task—multi-instance prediction—our experiments demonstrate that fine-tuning a network pre-trained using fractals attains 92.7-98.1% of the accuracy of an ImageNet pre-trained network. Our code is publicly available. 1
more » « less
Full Text Available
Fair Comparison: Quantifying Variance in Results for Fine-grained Visual Categorization

https://doi.org/10.1109/WACV48630.2021.00335

Gwilliam, Matthew; Teuscher, Adam; Anderson, Connor; Farrell, Ryan (January 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV))

For the task of image classification, researchers work arduously to develop the next state-of-the-art (SOTA) model, each bench-marking their own performance against that of their predecessors and of their peers. Unfortunately, the metric used most frequently to describe a model’s performance, average categorization accuracy, is often used in isolation. As the number of classes increases, such as in fine-grained visual categorization (FGVC), the amount of information conveyed by average accuracy alone dwindles. While its most glaring weakness is its failure to describe the model’s performance on a class-by-class basis, average accuracy also fails to describe how performance may vary from one trained model of the same architecture, on the same dataset, to another (both averaged across all categories and at the per-class level). We first demonstrate the magnitude of these variations across models and across class distributions based on attributes of the data, comparing results on different visual domains and different per-class image distributions, including long-tailed distributions and few-shot subsets. We then analyze the impact various FGVC methods have on overall and per-class variance. From this analysis, we both highlight the importance of reporting and comparing methods based on information beyond overall accuracy, as well as point out techniques that mitigate variance in FGVC results.
more » « less
Full Text Available
Have Fun Storming the Castle(s)!

https://doi.org/10.1109/WACV48630.2021.00375

Anderson, Connor; Teuscher, Adam; Anderson, Elizabeth; Larsen, Alysia; Shirley, Josh; Farrell, Ryan (January 2021, IEEE Winter Conference on Applications of Computer Vision (WACV))

In recent years, large-scale datasets, each typically tailored to a particular problem, have become a critical factor towards fueling rapid progress in the field of computer vision. This paper describes a valuable new dataset that should accelerate research efforts on problems such as fine-grained classification, instance recognition and retrieval, and geolocalization. The dataset, comprised of more than 2400 individual castles, palaces and fortresses from more than 90 countries, contains more than 770K images in total. This paper details the dataset's construction process, the characteristics including annotations such as location (geotagged latlong and country label), construction date, Google Maps link and estimated per-class and per-image difficulty. An experimental section provides baseline experiments for important vision tasks including classification, instance retrieval and geolocalization (estimating global location from an image's visual appearance). The dataset is publicly available at vision.cs.byu.edu/castles.
more » « less
Full Text Available

Search for: All records