Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multiagent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com/. 
                        more » 
                        « less   
                    
                            
                            StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments
                        
                    
    
            Fine-grained visual reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark fine-grained multi-agent categorization dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping fine-grained multi-agent categorization methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com/.</p> 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2212097
- PAR ID:
- 10442392
- Publisher / Repository:
- figshare
- Date Published:
- Subject(s) / Keyword(s):
- Autonomous agents and multiagent systems Intelligent robotics Planning and decision making Active sensing Computer vision Image processing Pattern recognition Stream and sensor data Cyberphysical systems and internet of things Mobile computing Operating systems Deep learning
- Format(s):
- Medium: X Size: 10915008969 Bytes
- Size(s):
- 10915008969 Bytes
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Key recognition tasks such as fine-grained visual categorization (FGVC) have benefited from increasing attention among computer vision researchers. The development and evaluation of new approaches relies heavily on benchmark datasets; such datasets are generally built primarily with categories that have images readily available, omitting categories with insufficient data. This paper takes a step back and rethinks dataset construction, focusing on intelligent image collection driven by: (i) the inclusion of all desired categories, and, (ii) the recognition performance on those categories. Based on a small, author-provided initial dataset, the proposed system recommends which categories the authors should prioritize collecting additional images for, with the intent of optimizing overall categorization accuracy. We show that mock datasets built using this method outperform datasets built without such a guiding framework. Additional experiments give prospective dataset creators intuition into how, based on their circumstances and goals, a dataset should be constructed.more » « less
- 
            While the community has seen many advances in recent years to address the challenging problem of Fine-grained Visual Categorization (FGVC), progress seems to be slowing—new state-of-the-art methods often distinguish themselves by improving top-1 accuracy by mere tenths of a percent. However, across all of the now-standard FGVC datasets, there remain sizeable portions of the test data that none of the current state-of-the-art (SOTA) models can successfully predict. This paper provides a framework for identifying and studying the errors that current methods make across diverse fine-grained datasets. Three models of difficulty—Prediction Overlap, Prediction Rank and Pair-wise Class Confusion—are employed to highlight the most challenging sets of images and classes. Extensive experiments apply a range of standard and SOTA methods, evaluating them on multiple FGVC domains and datasets. Insights acquired from coupling these difficulty paradigms with the careful analysis of experimental results suggest crucial areas for future FGVC research, focusing critically on the set of elusive images that none of the current models can correctly classify. Code is available at catalys1.github.io/elusive-images-fgvc.more » « less
- 
            Herbarium sheets present a unique view of the world's botanical history, evolution, and biodiversity. This makes them an all–important data source for botanical research. With the increased digitization of herbaria worldwide and advances in the domain of fine–grained visual classification which can facilitate automatic identification of herbarium specimen images, there are many opportunities for supporting and expanding research in this field. However, existing datasets are either too small, or not diverse enough, in terms of represented taxa, geographic distribution, and imaging protocols. Furthermore, aggregating datasets is difficult as taxa are recognized under a multitude of names and must be aligned to a common reference. We introduce the Herbarium 2021 Half–Earth dataset: the largest and most diverse dataset of herbarium specimen images, to date, for automatic taxon recognition. We also present the results of the Herbarium 2021 Half–Earth challenge, a competition that was part of the Eighth Workshop on Fine-Grained Visual Categorization (FGVC8) and hosted by Kaggle to encourage the development of models to automatically identify taxa from herbarium sheet images.more » « less
- 
            ABSTRACT A novel approach was proposed and implemented to assess the confidence of the individual class predictions made by convolutional neural networks trained to identify the type of fracture in metals. This approach involves utilizing contextual evidence in the form of contextual fracture images and contextual scores, which serve as indicators for determining the certainty of the predictions. This approach was first tested on both shallow and deep convolutional neural networks employing four publicly available image datasets: MNIST, EMNIST, FMNIST, and CIFAR10, and subsequently validated on an in‐house steel fracture dataset—FRAC, containing ductile and brittle fracture images. The effectiveness of the method is validated by producing contextual images and scores for the fracture image data and other image datasets to assess the confidence of selected predictions from the datasets. The CIFAR‐10 dataset yielded the lowest mean contextual score of 78 for the shallow model, with over 50% of representative test instances receiving a score below 90, indicating lower confidence in the model's predictions. In contrast, the CNN model used for the fracture dataset achieved a mean contextual score of 99, with 0% of representative test instances receiving a score below 90, suggesting a high level of confidence in its predictions. This approach enhances the interpretability of trained convolutional neural networks and provides greater insight into the confidence of their outputs.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
