skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention
Several recent studies have demonstrated the promise of deep visuomotor policies for robot manipulator control. Despite impressive progress, these systems are known to be vulnerable to physical disturbances, such as accidental or adversarial bumps that make them drop the manipulated object. They also tend to be distracted by visual disturbances such as objects moving in the robot’s field of view, even if the disturbance does not physically prevent the execution of the task. In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA). The manipulation task is specified with a natural language text such as “move the red bowl to the left”. This allows the visual attention component to concentrate on the current object that the robot needs to manipulate. We show that even in benign environments, the TFA allows the policy to consistently outperform a variant with no attention mechanism. More importantly, the new policy is significantly more robust: it regularly recovers from severe physical disturbances (such as bumps causing it to drop the object) from which the baseline policy, i.e. with no visual attention, almost never recovers. In addition, we show that the proposed policy performs correctly in the presence of a wide class of visual disturbances, exhibiting a behavior reminiscent of human selective visual attention experiments.  more » « less
Award ID(s):
1741431
PAR ID:
10111643
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
4254-4262
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Feature-based attention is known to enhance visual processing globally across the visual field, even at task-irrelevant locations. Here, we asked whether attention to object categories, in particular faces, shows similar location-independent tuning. Using EEG, we measured the face-selective N170 component of the EEG signal to examine neural responses to faces at task-irrelevant locations while participants attended to faces at another task-relevant location. Across two experiments, we found that visual processing of faces was amplified at task-irrelevant locations when participants attended to faces relative to when participants attended to either buildings or scrambled face parts. The fact that we see this enhancement with the N170 suggests that these attentional effects occur at the earliest stage of face processing. Two additional behavioral experiments showed that it is easier to attend to the same object category across the visual field relative to two distinct categories, consistent with object-based attention spreading globally. Together, these results suggest that attention to high-level object categories shows similar spatially global effects on visual processing as attention to simple, individual, low-level features. 
    more » « less
  2. null (Ed.)
    According to the influential “Two Visual Pathways” hypothesis, the cortical visual system is segregated into two pathways, with the ventral, occipitotemporal pathway subserving object perception, and the dorsal, occipitoparietal pathway subserving the visuomotor control of action. However, growing evidence suggests that the dorsal pathway also plays a functional role in object perception. In the current article, we present evidence that the dorsal pathway contributes uniquely to the perception of a range of visuospatial attributes that are not redundant with representations in ventral cortex. We describe how dorsal cortex is recruited automatically during perception, even when no explicit visuomotor response is required. Importantly, we propose that dorsal cortex may selectively process visual attributes that can inform the perception of potential actions on objects and environments, and we consider plausible developmental and cognitive mechanisms that might give rise to these representations. As such, we consider whether naturalistic stimuli, such as real-world solid objects, might engage dorsal cortex more so than simplified or artificial stimuli such as images that do not afford action, and how the use of suboptimal stimuli might limit our understanding of the functional contribution of dorsal cortex to visual perception. 
    more » « less
  3. Attention and emotion are fundamental psychological systems. It is well established that emotion intensifies attention. Three experiments reported here ( N = 235) demonstrated the reversed causal direction: Voluntary visual attention intensifies perceived emotion. In Experiment 1, participants repeatedly directed attention toward a target object during sequential search. Participants subsequently perceived their emotional reactions to target objects as more intense than their reactions to control objects. Experiments 2 and 3 used a spatial-cuing procedure to manipulate voluntary visual attention. Spatially cued attention increased perceived emotional intensity. Participants perceived spatially cued objects as more emotionally intense than noncued objects even when participants were asked to mentally rehearse the name of noncued objects. This suggests that the intensifying effect of attention is independent of more extensive mental rehearsal. Across experiments, attended objects were perceived as more visually distinctive, which statistically mediated the effects of attention on emotional intensity. 
    more » « less
  4. Abstract Traditionally, the exogenous control of gaze by external saliencies and the endogenous control of gaze by knowledge and context have been viewed as competing systems, with late infancy seen as a period of strengthening top‐down control over the vagaries of the input. Here we found that one‐year‐old infants control sustained attention through head movements that increase the visibility of the attended object. Freely moving one‐year‐old infants ( n  = 45) wore head‐mounted eye trackers and head motion sensors while exploring sets of toys of the same physical size. The visual size of the objects, a well‐documented salience, varied naturally with the infant's moment‐to‐moment posture and head movements. Sustained attention to an object was characterized by the tight control of head movements that created and then stabilized a visual size advantage for the attended object for sustained attention. The findings show collaboration between exogenous and endogenous attentional systems and suggest new hypotheses about the development of sustained visual attention. 
    more » « less
  5. null (Ed.)
    Robotic manipulation of deformable 1D objects such as ropes, cables, and hoses is challenging due to the lack of high-fidelity analytic models and large configuration spaces. Furthermore, learning end-to-end manipulation policies directly from images and physical interaction requires significant time on a robot and can fail to generalize across tasks. We address these challenges using interpretable deep visual representations for rope, extending recent work on dense object descriptors for robot manipulation. This facilitates the design of interpretable and transferable geometric policies built on top of the learned representations, decoupling visual reasoning and control. We present an approach that learns point-pair correspondences between initial and goal rope configurations, which implicitly encodes geometric structure, entirely in simulation from synthetic depth images. We demonstrate that the learned representation - dense depth object descriptors (DDODs) - can be used to manipulate a real rope into a variety of different arrangements either by learning from demonstrations or using interpretable geometric policies. In 50 trials of a knot-tying task with the ABB YuMi Robot, the system achieves a 66% knot-tying success rate from previously unseen configurations. See https://tinyurl.com/rope-learning for supplementary material and videos. 
    more » « less