skip to main content

This content will become publicly available on August 5, 2024

Title: On Human-like Biases in Convolutional Neural Networks for the Perception of Slant from Texture
Depth estimation is fundamental to 3D perception, and humans are known to have biased estimates of depth. This study investigates whether convolutional neural networks (CNNs) can be biased when predicting the sign of curvature and depth of surfaces of textured surfaces under different viewing conditions (field of view) and surface parameters (slant and texture irregularity). This hypothesis is drawn from the idea that texture gradients described by local neighborhoods—a cue identified in human vision literature—are also representable within convolutional neural networks. To this end, we trained both unsupervised and supervised CNN models on the renderings of slanted surfaces with random Polka dot patterns and analyzed their internal latent representations. The results show that the unsupervised models have similar prediction biases as humans across all experiments, while supervised CNN models do not exhibit similar biases. The latent spaces of the unsupervised models can be linearly separated into axes representing field of view and optical slant. For supervised models, this ability varies substantially with model architecture and the kind of supervision (continuous slant vs. sign of slant). Even though this study says nothing of any shared mechanism, these findings suggest that unsupervised CNN models can share similar predictions to the human visual system. Code:  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Applied Perception
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Visually guided movements can show surprising accuracy even when the perceived three-dimensional (3D) shape of the target is distorted. One explanation of this paradox is that an evolutionarily specialized “vision-for-action” system provides accurate shape estimates by relying selectively on stereo information and ignoring less reliable sources of shape information like texture and shading. However, the key support for this hypothesis has come from studies that analyze average behavior across many visuomotor interactions where available sensory feedback reinforces stereo information. The present study, which carefully accounts for the effects of feedback, shows that visuomotor interactions with slanted surfaces are actually planned using the same cue-combination function as slant perception and that apparent dissociations can arise due to two distinct supervised learning processes: sensorimotor adaptation and cue reweighting. In two experiments, we show that when a distorted slant cue biases perception (e.g., surfaces appear flattened by a fixed amount), sensorimotor adaptation rapidly adjusts the planned grip orientation to compensate for this constant error. However, when the distorted slant cue is unreliable, leading to variable errors across a set of objects (i.e., some slants are overestimated, others underestimated), then relative cue weights are gradually adjusted to reduce the misleading effect of the unreliable cue, consistent with previous perceptual studies of cue reweighting. The speed and flexibility of these two forms of learning provide an alternative explanation of why perception and action are sometimes found to be dissociated in experiments where some 3D shape cues are consistent with sensory feedback while others are faulty. NEW & NOTEWORTHY When interacting with three-dimensional (3D) objects, sensory feedback is available that could improve future performance via supervised learning. Here we confirm that natural visuomotor interactions lead to sensorimotor adaptation and cue reweighting, two distinct learning processes uniquely suited to resolve errors caused by biased and noisy 3D shape cues. These findings explain why perception and action are often found to be dissociated in experiments where some cues are consistent with sensory feedback while others are faulty. 
    more » « less
  2. null (Ed.)
    In recent decades, computer vision has proven remarkably effective in addressing diverse issues in public health, from determining the diagnosis, prognosis, and treatment of diseases in humans to predicting infectious disease outbreaks. Here, we investigate whether convolutional neural networks (CNNs) can also demonstrate effectiveness in classifying the environmental stages of parasites of public health importance and their invertebrate hosts. We used schistosomiasis as a reference model. Schistosomiasis is a debilitating parasitic disease transmitted to humans via snail intermediate hosts. The parasite affects more than 200 million people in tropical and subtropical regions. We trained our CNN, a feed-forward neural network, on a limited dataset of 5,500 images of snails and 5,100 images of cercariae obtained from schistosomiasis transmission sites in the Senegal River Basin, a region in western Africa that is hyper-endemic for the disease. The image set included both images of two snail genera that are relevant to schistosomiasis transmission – that is, Bulinus spp. and Biomphalaria pfeifferi – as well as snail images that are non-component hosts for human schistosomiasis. Cercariae shed from Bi. pfeifferi and Bulinus spp. snails were classified into 11 categories, of which only two, S. haematobium and S. mansoni , are major etiological agents of human schistosomiasis. The algorithms, trained on 80% of the snail and parasite dataset, achieved 99% and 91% accuracy for snail and parasite classification, respectively, when used on the hold-out validation dataset – a performance comparable to that of experienced parasitologists. The promising results of this proof-of-concept study suggests that this CNN model, and potentially similar replicable models, have the potential to support the classification of snails and parasite of medical importance. In remote field settings where machine learning algorithms can be deployed on cost-effective and widely used mobile devices, such as smartphones, these models can be a valuable complement to laboratory identification by trained technicians. Future efforts must be dedicated to increasing dataset sizes for model training and validation, as well as testing these algorithms in diverse transmission settings and geographies. 
    more » « less
  3. We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3d modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation of synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an increased field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while biking in an urban setting. 
    more » « less
  4. Abstract

    This paper presents a novel application of convolutional neural network (CNN) models for filtering the intraseasonal variability of the tropical atmosphere. In this deep learning filter, two convolutional layers are applied sequentially in a supervised machine learning framework to extract the intraseasonal signal from the total daily anomalies. The CNN-based filter can be tailored for each field similarly to fast Fourier transform filtering methods. When applied to two different fields (zonal wind stress and outgoing longwave radiation), the index of agreement between the filtered signal obtained using the CNN-based filter and a conventional weight-based filter is between 95% and 99%. The advantage of the CNN-based filter over the conventional filters is its applicability to time series with the length comparable to the period of the signal being extracted.

    Significance Statement

    This study proposes a new method for discovering hidden connections in data representative of tropical atmosphere variability. The method makes use of an artificial intelligence (AI) algorithm that combines a mathematical operation known as convolution with a mathematical model built to reflect the behavior of the human brain known as artificial neural network. Our results show that the filtered data produced by the AI-based method are consistent with the results obtained using conventional mathematical algorithms. The advantage of the AI-based method is that it can be applied to cases for which the conventional methods have limitations, such as forecast (hindcast) data or real-time monitoring of tropical variability in the 20–100-day range.

    more » « less
  5. Given earth imagery with spectral features on a terrain surface, this paper studies surface segmentation based on both explanatory features and surface topology. The problem is important in many spatial and spatiotemporal applications such as flood extent mapping in hydrology. The problem is uniquely challenging for several reasons: first, the size of earth imagery on a terrain surface is often much larger than the input of popular deep convolutional neural networks; second, there exists topological structure dependency between pixel classes on the surface, and such dependency can follow an unknown and non-linear distribution; third, there are often limited training labels. Existing methods for earth imagery segmentation often divide the imagery into patches and consider the elevation as an additional feature channel. These methods do not fully incorporate the spatial topological structural constraint within and across surface patches and thus often show poor results, especially when training labels are limited. Existing methods on semi-supervised and unsupervised learning for earth imagery often focus on learning representation without explicitly incorporating surface topology. In contrast, we propose a novel framework that explicitly models the topological skeleton of a terrain surface with a contour tree from computational topology, which is guided by the physical constraint (e.g., water flow direction on terrains). Our framework consists of two neural networks: a convolutional neural network (CNN) to learn spatial contextual features on a 2D image grid, and a graph neural network (GNN) to learn the statistical distribution of physics-guided spatial topological dependency on the contour tree. The two models are co-trained via variational EM. Evaluations on the real-world flood mapping datasets show that the proposed models outperform baseline methods in classification accuracy, especially when training labels are limited. 
    more » « less