Modern machine learning algorithms typically require large amounts of labeled training data to fit a reliable model. To minimize the cost of data collection, researchers often employ techniques such as crowdsourcing and web scraping. However, web data and human annotations are known to exhibit high margins of error, resulting in sizable amounts of incorrect labels. Poorly labeled training data can cause models to overfit to the noise distribution, crippling performance in real-world applications. In this work, we investigate the viability of using data augmentation in conjunction with semi-supervised learning to improve the label noise robustness of image classification models. We conduct several experiments using noisy variants of the CIFAR-10 image classification dataset to benchmark our method against existing algorithms. Experimental results show that our augmentative SSL approach improves upon the state-of-the-art.
more »
« less
Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio
Understanding the behavioral and neural dynamics of social interactions is a goal of contemporary neuroscience. Many machine learning methods have emerged in recent years to make sense of complex video and neurophysiological data that result from these experiments. Less focus has been placed on understanding how animals process acoustic information, including social vocalizations. A critical step to bridge this gap is determining the senders and receivers of acoustic information in social interactions. While sound source localization (SSL) is a classic problem in signal processing, existing approaches are limited in their ability to localize animal-generated sounds in standard laboratory environments. Advances in deep learning methods for SSL are likely to help address these limitations, however there are currently no publicly available models, datasets, or benchmarks to systematically evaluate SSL algorithms in the domain of bioacoustics. Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents. We acquired synchronized video and multi-channel audio recordings of 767,295 sounds with annotated ground truth sources across 9 conditions. The dataset provides benchmarks which evaluate SSL performance on real data, simulated acoustic data, and a mixture of real and simulated data. We intend for this benchmark to facilitate knowledge transfer between the neuroscience and acoustic machine learning communities, which have had limited overlap.
more »
« less
- Award ID(s):
- 1922658
- PAR ID:
- 10649768
- Publisher / Repository:
- Neural Information Processing Systems Foundation, Inc. (NeurIPS)
- Date Published:
- Page Range / eLocation ID:
- 106370 to 106382
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Modern machine learning algorithms typically require large amounts of labeled training data to fit a reliable model. To minimize the cost of data collection, researchers often employ techniques such as crowdsourcing and web scraping. However, web data and human annotations are known to exhibit high margins of error, resulting in sizable amounts of incorrect labels. Poorly labeled training data can cause models to overfit to the noise distribution, crippling performance in real-world applications. In this work, we investigate the viability of using data augmentation in conjunction with semi-supervised learning to improve the label noise robustness of image classification models. We conduct several experiments using noisy variants of the CIFAR-10 image classification dataset to benchmark our method against existing algorithms. Experimental results show that our augmentative SSL approach improves upon the state-of-the-art.more » « less
-
Urban dynamics is complex and interconnected across various social and environmental systems. To better understand such dynamics, this study proposes a scalable and flexible video machine learning framework for spatiotemporal analysis of urban dynamics. The framework is based on a space–time cube representation and decomposes the cube structure along the temporal dimension into a sequence of time‐series spatial aggregation, similar to a video. State‐of‐the‐art video machine learning models including ConvLSTM, predRNN, predRNN‐V2, and E3D‐LSTM are utilized for spatiotemporal modeling and prediction. The scalability of this cyberGIS‐enabled framework is shown by its applicability to diverse geographic regions, its ability to address various urban problems, and its capacity to integrate heterogeneous geospatial data. Moreover, the framework's flexibility is further enhanced by adjustable spatial and temporal granularity. The framework's effectiveness is validated through two case studies: (1) a real‐world urban heat analysis in Cook County, Illinois, USA in 2018, which achieved an RMSE of 0.60535°C, representing a 46% improvement over established benchmarks; and (2) a simulated dataset analysis demonstrating the framework's adaptability for spatial heterogeneity and temporal changes. A series of evaluations demonstrate the effectiveness of the proposed framework in spatiotemporal analysis of complex urban dynamics.more » « less
-
Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize dataset contributors. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with rich characteristics, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at https://github.com/Graph-Learning-Benchmarks/gli.more » « less
-
While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid and soft-body collisions, stable multi-object configurations, rolling, sliding, and projectile motion, thus providing a more comprehensive challenge than previous benchmarks. We used Physion to benchmark a suite of models varying in their architecture, learning objective, input-output structure, and training data. In parallel, we obtained precise measurements of human prediction behavior on the same set of scenarios, allowing us to directly evaluate how well any model could approximate human behavior. We found that vision algorithms that learn object-centric representations generally outperform those that do not, yet still fall far short of human performance. On the other hand, graph neural networks with direct access to physical state information both perform substantially better and make predictions that are more similar to those made by humans. These results suggest that extracting physical representations of scenes is the main bottleneck to achieving human-level and human-like physical understanding in vision algorithms. We have publicly released all data and code to facilitate the use of Physion to benchmark additional models in a fully reproducible manner, enabling systematic evaluation of progress towards vision algorithms that understand physical environments as robustly as people do.more » « less
An official website of the United States government

