skip to main content


Title: Experimental Autonomous Deep Learning-Based 3D Path Planning for a 7-DOF Robot Manipulator
In this paper, we examine the autonomous operation of a high-DOF robot manipulator. We investigate a pick-and-place task where the position and orientation of an object, an obstacle, and a target pad are initially unknown and need to be autonomously determined. In order to complete this task, we employ a combination of computer vision, deep learning, and control techniques. First, we locate the center of each item in two captured images utilizing HSV-based scanning. Second, we utilize stereo vision techniques to determine the 3D position of each item. Third, we implement a Convolutional Neural Network in order to determine the orientation of the object. Finally, we use the calculated 3D positions of each item to establish an obstacle avoidance trajectory lifting the object over the obstacle and onto the target pad. Through the results of our research, we demonstrate that our combination of techniques has minimal error, is capable of running in real-time, and is able to reliably perform the task. Thus, we demonstrate that through the combination of specialized autonomous techniques, generalization to a complex autonomous task is possible.  more » « less
Award ID(s):
1823951 1823983
NSF-PAR ID:
10134879
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ASME 2019 Dynamic Systems and Control Conference
Volume:
2
Page Range / eLocation ID:
V002T14A002
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Optical imaging techniques, such as light detection and ranging (LiDAR), are essential tools in remote sensing, robotic vision, and autonomous driving. However, the presence of scattering places fundamental limits on our ability to image through fog, rain, dust, or the atmosphere. Conventional approaches for imaging through scattering media operate at microscopic scales or require a priori knowledge of the target location for 3D imaging. We introduce a technique that co-designs single-photon avalanche diodes, ultra-fast pulsed lasers, and a new inverse method to capture 3D shape through scattering media. We demonstrate acquisition of shape and position for objects hidden behind a thick diffuser (≈6 transport mean free paths) at macroscopic scales. Our technique, confocal diffuse tomography, may be of considerable value to the aforementioned applications.

     
    more » « less
  2. null (Ed.)
    Many recent video applications |including autonomous driving, traffic monitoring, drone analytics, large-scale surveillance networks, and virtual reality require reasoning about, combining, and operating over many video streams, each with distinct position and orientation. However, modern video data management systems are largely designed to process individual streams of video data as if they were independent and unrelated. In this paper, we present VisualWorldDB, a vision and an initial architecture for a new type of database management system optimized for multi-video applications. VisualWorldDB ingests video data from many perspectives and makes them queryable as a single multidimensional visual object. It incorporates new techniques for optimizing, executing, and storing multi-perspective video data. Our preliminary results suggest that this approach allows for faster queries and lower storage costs, improving the state of the art for applications that operate over this type of video data. 
    more » « less
  3. Autonomous mobile robots (AMRs) have been widely utilized in industry to execute various on-board computer-vision applications including autonomous guidance, security patrol, object detection, and face recognition. Most of the applications executed by an AMR involve the analysis of camera images through trained machine learning models. Many research studies on machine learning focus either on performance without considering energy efficiency or on techniques such as pruning and compression to make the model more energy-efficient. However, most previous work do not study the root causes of energy inefficiency for the execution of those applications on AMRs. The computing stack on an AMR accounts for 33% of the total energy consumption and can thus highly impact the battery life of the robot. Because recharging an AMR may disrupt the application execution, it is important to efficiently utilize the available energy for maximized battery life. In this paper, we first analyze the breakdown of power dissipation for the execution of computer-vision applications on AMRs and discover three main root causes of energy inefficiency: uncoordinated access to sensor data, performance-oriented model inference execution, and uncoordinated execution of concurrent jobs. In order to fix these three inefficiencies, we propose E2M, an energy-efficient middleware software stack for autonomous mobile robots. First, E2M regulates the access of different processes to sensor data, e.g., camera frames, so that the amount of data actually captured by concurrently executing jobs can be minimized. Second, based on a predefined per-process performance metric (e.g., safety, accuracy) and desired target, E2M manipulates the process execution period to find the best energy-performance trade off. Third, E2M coordinates the execution of the concurrent processes to maximize the total contiguous sleep time of the computing hardware for maximized energy savings. We have implemented a prototype of E2M on a real-world AMR. Our experimental results show that, compared to several baselines, E2M leads to 24% energy savings for the computing platform, which translates into an extra 11.5% of battery time and 14 extra minutes of robot runtime, with a performance degradation lower than 7.9% for safety and 1.84% for accuracy. 
    more » « less
  4. null (Ed.)
    Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learning-based solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches (code and data available at https://github.com/AntheaLi/3DPartAssembly). 
    more » « less
  5. The current study examined the neural correlates of spatial rotation in eight engineering undergraduates. Mastering engineering graphics requires students to mentally visualize in 3D and mentally rotate parts when developing 2D drawings. Students’ spatial rotation skills play a significant role in learning and mastering engineering graphics. Traditionally, the assessment of students’ spatial skills involves no measurements of neural activity during student performance of spatial rotation tasks. We used electroencephalography (EEG) to record neural activity while students performed the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). The two main objectives were to 1) determine whether high versus low performers on the Revised PSVT:R show differences in EEG oscillations and 2) identify EEG oscillatory frequency bands sensitive to item difficulty on the Revised PSVT:R.  Overall performance on the Revised PSVT:R determined whether participants were considered high or low performers: students scoring 90% or higher were considered high performers (5 students), whereas students scoring under 90% were considered low performers (3 students). Time-frequency analysis of the EEG data quantified power in several oscillatory frequency bands (alpha, beta, theta, gamma, delta) for comparison between low and high performers, as well as between difficulty levels of the spatial rotation problems.   Although we did not find any significant effects of performance type (high, low) on EEG power, we observed a trend in reduced absolute delta and gamma power for hard problems relative to easier problems. Decreases in delta power have been reported elsewhere for difficult relative to easy arithmetic calculations, and attributed to greater external attention (e.g., attention to the stimuli/numbers), and consequently, reduced internal attention (e.g., mentally performing the calculation). In the current task, a total of three spatial objects are presented. An example rotation stimulus is presented, showing a spatial object before and after rotation. A target stimulus, or spatial object before rotation is then displayed. Students must choose one of five stimuli (multiple choice options) that indicates the correct representation of the object after rotation. Reduced delta power in the current task implies that students showed greater attention to the example and target stimuli for the hard problem, relative to the moderate and easy problems. Therefore, preliminary findings suggest that students are less efficient at encoding the target stimuli (external attention) prior to mental rotation (internal attention) when task difficulty increases.  Our findings indicate that delta power may be used to identify spatial rotation items that are especially challenging for students. We may then determine the efficacy of spatial rotation interventions among engineering education students, using delta power as an index for increases in internal attention (e.g., increased delta power). Further, in future work, we will also use eye-tracking to assess whether our intervention decreases eye fixation (e.g., time spent viewing) toward the target stimulus on the Revised PSVT:R. By simultaneously using EEG and eye-tracking, we may identify changes in internal attention and encoding of the target stimuli that are predictive of improvements in spatial rotation skills among engineering education students.  
    more » « less