Visually guided movements can show surprising accuracy even when the perceived three-dimensional (3D) shape of the target is distorted. One explanation of this paradox is that an evolutionarily specialized “vision-for-action” system provides accurate shape estimates by relying selectively on stereo information and ignoring less reliable sources of shape information like texture and shading. However, the key support for this hypothesis has come from studies that analyze average behavior across many visuomotor interactions where available sensory feedback reinforces stereo information. The present study, which carefully accounts for the effects of feedback, shows that visuomotor interactions with slanted surfaces are actually planned using the same cue-combination function as slant perception and that apparent dissociations can arise due to two distinct supervised learning processes: sensorimotor adaptation and cue reweighting. In two experiments, we show that when a distorted slant cue biases perception (e.g., surfaces appear flattened by a fixed amount), sensorimotor adaptation rapidly adjusts the planned grip orientation to compensate for this constant error. However, when the distorted slant cue is unreliable, leading to variable errors across a set of objects (i.e., some slants are overestimated, others underestimated), then relative cue weights are gradually adjusted to reduce the misleading effect of the unreliable cue, consistent with previous perceptual studies of cue reweighting. The speed and flexibility of these two forms of learning provide an alternative explanation of why perception and action are sometimes found to be dissociated in experiments where some 3D shape cues are consistent with sensory feedback while others are faulty. NEW & NOTEWORTHY When interacting with three-dimensional (3D) objects, sensory feedback is available that could improve future performance via supervised learning. Here we confirm that natural visuomotor interactions lead to sensorimotor adaptation and cue reweighting, two distinct learning processes uniquely suited to resolve errors caused by biased and noisy 3D shape cues. These findings explain why perception and action are often found to be dissociated in experiments where some cues are consistent with sensory feedback while others are faulty.
more »
« less
The case against probabilistic inference: a new deterministic theory of 3D visual processing
How the brain derives 3D information from inherently ambiguous visual input remains the fundamental question of human vision. The past two decades of research have addressed this question as a problem of probabilistic inference, the dominant model being maximum-likelihood estimation (MLE). This model assumes that independent depth-cue modules derive noisy but statistically accurate estimates of 3D scene parameters that are combined through a weighted average. Cue weights are adjusted based on the system representation of each module's output variability. Here I demonstrate that the MLE model fails to account for important psychophysical findings and, importantly, misinterprets the just noticeable difference, a hallmark measure of stimulus discriminability, to be an estimate of perceptual uncertainty. I propose a new theory, termed Intrinsic Constraint, which postulates that the visual system does not derive the most probable interpretation of the visual input, but rather, the most stable interpretation amid variations in viewing conditions. This goal is achieved with the Vector Sum model, which represents individual cue estimates as components of a multi-dimensional vector whose norm determines the combined output. This model accounts for the psychophysical findings cited in support of MLE, while predicting existing and new findings that contradict the MLE model. This article is part of a discussion meeting issue ‘New approaches to 3D vision’.
more »
« less
- Award ID(s):
- 2120610
- PAR ID:
- 10442586
- Date Published:
- Journal Name:
- Philosophical Transactions of the Royal Society B: Biological Sciences
- Volume:
- 378
- Issue:
- 1869
- ISSN:
- 0962-8436
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Depth estimation is fundamental to 3D perception, and humans are known to have biased estimates of depth. This study investigates whether convolutional neural networks (CNNs) can be biased when predicting the sign of curvature and depth of surfaces of textured surfaces under different viewing conditions (field of view) and surface parameters (slant and texture irregularity). This hypothesis is drawn from the idea that texture gradients described by local neighborhoods—a cue identified in human vision literature—are also representable within convolutional neural networks. To this end, we trained both unsupervised and supervised CNN models on the renderings of slanted surfaces with random Polka dot patterns and analyzed their internal latent representations. The results show that the unsupervised models have similar prediction biases as humans across all experiments, while supervised CNN models do not exhibit similar biases. The latent spaces of the unsupervised models can be linearly separated into axes representing field of view and optical slant. For supervised models, this ability varies substantially with model architecture and the kind of supervision (continuous slant vs. sign of slant). Even though this study says nothing of any shared mechanism, these findings suggest that unsupervised CNN models can share similar predictions to the human visual system. Code: github.com/brownvc/Slant-CNN-Biasesmore » « less
-
Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for encoding it. With the advent of deep learning methods in recent years, significant advances have been made in natural language processing (specifically neural machine translation) and in computer vision methods (specifically image and video captioning). Researchers have therefore begun expanding these learning methods to sign language understanding. Sign language interpretation is especially challenging, because it involves a continuous visual-spatial modality where meaning is often derived based on context. The focus of this article, therefore, is to examine various deep learning–based methods for encoding sign language as inputs, and to analyze the efficacy of several machine translation methods, over three different sign language datasets. The goal is to determine which combinations are sufficiently robust for sign language translation without any gloss-based information. To understand the role of the different input features, we perform ablation studies over the model architectures (input features + neural translation models) for improved continuous sign language translation. These input features include body and finger joints, facial points, as well as vector representations/embeddings from convolutional neural networks. The machine translation models explored include several baseline sequence-to-sequence approaches, more complex and challenging networks using attention, reinforcement learning, and the transformer model. We implement the translation methods over multiple sign languages—German (GSL), American (ASL), and Chinese sign languages (CSL). From our analysis, the transformer model combined with input embeddings from ResNet50 or pose-based landmark features outperformed all the other sequence-to-sequence models by achieving higher BLEU2-BLEU4 scores when applied to the controlled and constrained GSL benchmark dataset. These combinations also showed significant promise on the other less controlled ASL and CSL datasets.more » « less
-
The paper discusses a machine learning vision and nonlinear control approach for autonomous ship landing of vertical flight aircraft without utilizing GPS signal. The central idea involves automating the Navy helicopter ship landing procedure where the pilot utilizes the ship as the visual reference for long-range tracking, but refers to a standardized visual cue installed on most Navy ships called the ”horizon bar” for the final approach and landing phases. This idea is implemented using a uniquely designed nonlinear controller integrated with machine vision. The vision system utilizes machine learning based object detection for long-range ship tracking, and classical computer vision for object detection and the estimation of aircraft relative position and orientation during the final approach and landing phases. The nonlinear controller operates based on the information estimated by the vision system and has demonstrated robust tracking performance even in the presence of uncertainties. The developed autonomous ship landing system is implemented on a quad-rotor vertical take-off and landing (VTOL) capable unmanned aerial vehicle (UAV) equipped with an onboard camera and was demonstrated on a moving deck, which imitates realistic ship deck motions using a Stewart platform and a visual cue equivalent to the horizon bar. Extensive simulations and flight tests are conducted to demonstrate vertical landing safety, tracking capability, and landing accuracy while the deck is in motion.more » « less
-
Abstract The Bradley–Terry–Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of $$\ell _2$$ estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.more » « less
An official website of the United States government

