skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Differential involvement of EEG oscillatory components in sameness vs. spatial-relation visual reasoning tasks
The development of deep convolutional neural networks (CNNs) has recently led to great successes in computer vision and CNNs have become de facto computational models of vision. However, a growing body of work suggests that they exhibit critical limitations beyond image categorization. Here, we study one such fundamental limitation, for judging whether two simultaneously presented items are the same or different (SD) compared to a baseline assessment of their spatial relationship (SR). In both human subjects and artificial neural networks, we test the prediction that SD tasks recruit additional cortical mechanisms which underlie critical aspects of visual cognition that are not explained by current computational models. We thus recorded EEG signals from human participants engaged in the same tasks as the computational models. Importantly, in humans the two tasks were matched in terms of difficulty by an adaptive psychometric procedure: yet, on top of a modulation of evoked potentials, our results revealed higher activity in the low beta (16-24Hz) band in the SD compared to the SR conditions. We surmise that these oscillations reflect the crucial involvement of additional mechanisms, such as working memory and attention, which are missing in current feed-forward CNNs.  more » « less
Award ID(s):
1912280
PAR ID:
10205785
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ENeuro
ISSN:
2373-2822
Page Range / eLocation ID:
ENEURO.0267-20.2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation. Most state-of-the-art neural networks are over-parameterized and require a high computational cost. One straightforward solution is to replace the layers of the networks with their low-rank tensor approximations using different tensor decomposition methods. This article reviews six tensor decomposition methods and illustrates their ability to compress model parameters of convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformers. The accuracy of some compressed models can be higher than the original versions. Evaluations indicate that tensor decompositions can achieve significant reductions in model size, run-time and energy consumption, and are well suited for implementing neural networks on edge devices. 
    more » « less
  2. Abstract Drones are increasingly popular for collecting behaviour data of group‐living animals, offering inexpensive and minimally disruptive observation methods. Imagery collected by drones can be rapidly analysed using computer vision techniques to extract information, including behaviour classification, habitat analysis and identification of individual animals. While computer vision techniques can rapidly analyse drone‐collected data, the success of these analyses often depends on careful mission planning that considers downstream computational requirements—a critical factor frequently overlooked in current studies.We present a comprehensive summary of research in the growing AI‐driven animal ecology (ADAE) field, which integrates data collection with automated computational analysis focused on aerial imagery for collective animal behaviour studies. We systematically analyse current methodologies, technical challenges and emerging solutions in this field, from drone mission planning to behavioural inference. We illustrate computer vision pipelines that infer behaviour from drone imagery and present the computer vision tasks used for each step. We map specific computational tasks to their ecological applications, providing a framework for future research design.Our analysis reveals AI‐driven animal ecology studies for collective animal behaviour using drone imagery focus on detection and classification computer vision tasks. While convolutional neural networks (CNNs) remain dominant for detection and classification tasks, newer architectures like transformer‐based models and specialized video analysis networks (e.g. X3D, I3D, SlowFast) designed for temporal pattern recognition are gaining traction for pose estimation and behaviour inference. However, reported model accuracy varies widely by computer vision task, species, habitats and evaluation metrics, complicating meaningful comparisons between studies.Based on current trends, we conclude semi‐autonomous drone missions will be increasingly used to study collective animal behaviour. While manual drone operation remains prevalent, autonomous drone manoeuvrers, powered by edge AI, can scale and standardise collective animal behavioural studies while reducing the risk of disturbance and improving data quality. We propose guidelines for AI‐driven animal ecology drone studies adaptable to various computer vision tasks, species and habitats. This approach aims to collect high‐quality behaviour data while minimising disruption to the ecosystem. 
    more » « less
  3. Deformable Convolutional Networks (DCN) have been proposed as a powerful tool to boost the representation power of Convolutional Neural Networks (CNN) in computer vision tasks via adaptive sampling of the input feature map. Much like vision transformers, DCNs utilize a more flexible inductive bias than standard CNNs and have also been shown to improve performance of particular models. For example, drop-in DCN layers were shown to increase the AP score of Mask RCNN by 10.6 points while introducing only 1% additional parameters and FLOPs, improving the state-of-the art model at the time of publication. However, despite evidence that more DCN layers placed earlier in the network can further improve performance, we have not seen this trend continue with further scaling of deformations in CNNs, unlike for vision transformers. Benchmarking experiments show that a realistically sized DCN layer (64H×64W, 64 in-out channel) incurs a 4× slowdown on a GPU platform, discouraging the more ubiquitous use of deformations in CNNs. These slowdowns are caused by the irregular input-dependent access patterns of the bilinear interpolation operator, which has a disproportionately low arithmetic intensity (AI) compared to the rest of the DCN. To address the disproportionate slowdown of DCNs and enable their expanded use in CNNs, we propose DefT, a series of workload-aware optimizations for DCN kernels. DefT identifies performance bottlenecks in DCNs and fuses specific operators that are observed to limit DCN AI. Our approach also uses statistical information of DCN workloads to adapt the workload tiling to the DCN layer dimensions, minimizing costly out-of-boundary input accesses. Experimental results show that DefT mitigates up to half of DCN slowdown over the current-art PyTorch implementation. This translates to a layerwise speedup of up to 134% and a reduction of normalized training time of 46% on a fully DCN-enabled ResNet model. 
    more » « less
  4. Abstract The remarkable successes of convolutional neural networks (CNNs) in modern computer vision are by now well known, and they are increasingly being explored as computational models of the human visual system. In this paper, we ask whether CNNs might also provide a basis for modeling higher‐level cognition, focusing on the core phenomena of similarity and categorization. The most important advance comes from the ability of CNNs to learn high‐dimensional representations of complex naturalistic images, substantially extending the scope of traditional cognitive models that were previously only evaluated with simple artificial stimuli. In all cases, the most successful combinations arise when CNN representations are used with cognitive models that have the capacity to transform them to better fit human behavior. One consequence of these insights is a toolkit for the integration of cognitively motivated constraints back into CNN training paradigms in computer vision and machine learning, and we review cases where this leads to improved performance. A second consequence is a roadmap for how CNNs and cognitive models can be more fully integrated in the future, allowing for flexible end‐to‐end algorithms that can learn representations from data while still retaining the structured behavior characteristic of human cognition. 
    more » « less
  5. We use multimodal deep neural networks to identify sites of multimodal integration in the human brain and investigate how well these networks model integration in the brain. Sites of multimodal integration are regions where a multimodal language-vision model is better at predicting neural recordings (stereoelectroencephalography, SEEG) than either a unimodal language, unimodal vision, or a linearly-integrated language-vision model. We use a range of state-of-the-art models spanning different architectures including Transformers and CNNs with different multimodal integration approaches to model the SEEG signal while subjects watched movies. As a key enabling step, we first demonstrate that the approach has the resolution to distinguish trained from randomly-initialized models for both language and vision; the inability to do so would fundamentally hinder further analysis. We show that trained models systematically outperform randomly initialized models in their ability to predict the SEEG signal. We then compare unimodal and multimodal models against one another. Since models all have different architectures, number of parameters, and training sets which can obscure the results, we then carry out a test between two controlled models: SLIP-Combo and SLIP-SimCLR which keep all of these attributes the same aside from multimodal input. Our first key contribution identifies neural sites (on average 141 out of 1090 total sites or 12.94\%) and brain regions where multimodal integration is occurring. Our second key contribution finds that CLIP-style training is best suited for modeling multimodal integration in the brain when analyzing different methods of multimodal integration and how they model the brain. 
    more » « less