skip to main content


Title: Light-Weight Object Detection and Decision Making via Approximate Computing in Resource-Constrained Mobile Robots
Most of the current solutions for autonomous flights in indoor environments rely on purely geometric maps (e.g., point clouds). There has been, however, a growing interest in supplementing such maps with semantic information (e.g., object detections) using computer vision algorithms. Unfortunately, there is a disconnect between the relatively heavy computational requirements of these computer vision solutions, and the limited computation capacity available on mobile autonomous platforms. In this paper, we propose to bridge this gap with a novel Markov Decision Process framework that adapts the parameters of the vision algorithms to the incoming video data rather than fixing them a priori. As a concrete example, we test our framework on a object detection and tracking task, showing significant benefits in terms of energy consumption without considerable loss in accuracy, using a combination of publicly available and novel datasets.  more » « less
Award ID(s):
1734454
NSF-PAR ID:
10107917
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE/RSJ International Conference on Intelligent Robots and Systems
Page Range / eLocation ID:
6776 to 6781
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Autonomous mobile robots (AMRs) have been widely utilized in industry to execute various on-board computer-vision applications including autonomous guidance, security patrol, object detection, and face recognition. Most of the applications executed by an AMR involve the analysis of camera images through trained machine learning models. Many research studies on machine learning focus either on performance without considering energy efficiency or on techniques such as pruning and compression to make the model more energy-efficient. However, most previous work do not study the root causes of energy inefficiency for the execution of those applications on AMRs. The computing stack on an AMR accounts for 33% of the total energy consumption and can thus highly impact the battery life of the robot. Because recharging an AMR may disrupt the application execution, it is important to efficiently utilize the available energy for maximized battery life. In this paper, we first analyze the breakdown of power dissipation for the execution of computer-vision applications on AMRs and discover three main root causes of energy inefficiency: uncoordinated access to sensor data, performance-oriented model inference execution, and uncoordinated execution of concurrent jobs. In order to fix these three inefficiencies, we propose E2M, an energy-efficient middleware software stack for autonomous mobile robots. First, E2M regulates the access of different processes to sensor data, e.g., camera frames, so that the amount of data actually captured by concurrently executing jobs can be minimized. Second, based on a predefined per-process performance metric (e.g., safety, accuracy) and desired target, E2M manipulates the process execution period to find the best energy-performance trade off. Third, E2M coordinates the execution of the concurrent processes to maximize the total contiguous sleep time of the computing hardware for maximized energy savings. We have implemented a prototype of E2M on a real-world AMR. Our experimental results show that, compared to several baselines, E2M leads to 24% energy savings for the computing platform, which translates into an extra 11.5% of battery time and 14 extra minutes of robot runtime, with a performance degradation lower than 7.9% for safety and 1.84% for accuracy. 
    more » « less
  2. Image data plays a pivotal role in the current data-driven era, particularly in applications such as computer vision, object recognition, and facial identification. Google Maps ® stands out as a widely used platform that heavily relies on street view images. To fulfill the pressing need for an effective and distributed mechanism for image data collection, we present a framework that utilizes smart contract technology and open-source robots to gather street-view image sequences. The proposed framework also includes a protocol for maintaining these sequences using a private blockchain capable of retaining different versions of street views while ensuring the integrity of collected data. With this framework, Google Maps ® data can be securely collected, stored, and published on a private blockchain. By conducting tests with actual robots, we demonstrate the feasibility of the framework and its capability to seamlessly upload privately maintained blockchain image sequences to Google Maps ® using the Google Street View ® Publish API. 
    more » « less
  3. Computer vision has shown promising potential in wearable robotics applications (e.g., human grasping target prediction and context understanding). However, in practice, the performance of computer vision algorithms is challenged by insufficient or biased training, observation noise, cluttered background, etc. By leveraging Bayesian deep learning (BDL), we have developed a novel, reliable vision-based framework to assist upper limb prosthesis grasping during arm reaching. This framework can measure different types of uncertainties from the model and data for grasping target recognition in realistic and challenging scenarios. A probability calibration network was developed to fuse the uncertainty measures into one calibrated probability for online decision making. We formulated the problem as the prediction of grasping target while arm reaching. Specifically, we developed a 3-D simulation platform to simulate and analyze the performance of vision algorithms under several common challenging scenarios in practice. In addition, we integrated our approach into a shared control framework of a prosthetic arm and demonstrated its potential at assisting human participants with fluent target reaching and grasping tasks. 
    more » « less
  4. Autonomous driving in dense urban areas presents an especially difficult task. First, globally localizing information, such as GPS signal, often proves to be unreliable in such areas due to signal shadowing and multipath errors. Second, the high‐definition environmental maps with sufficient information for autonomous navigation require a large amount of data to be collected from these areas, significant postprocessing of this data to generate the map, and then continual maintenance of the map to account for changes in the environment. This paper addresses the issue of autonomous driving in urban environments by investigating algorithms and an architecture to enable fully functional autonomous driving with little to no reliance on map‐based measurements or GPS signals. An extended Kalman filter with odometry, compass, and sparse landmark measurements as inputs is used to provide localization. Real‐time detection and estimation of key roadway features are used to create an understanding of the surrounding static scene. Navigation is accomplished by a compass‐based navigation control law. Experimental scene understanding results are obtained using computer vision and estimation techniques and demonstrate the ability to probabilistically infer key features of an intersection in real time. Key results from Monte Carlo studies demonstrate the proposed localization and navigation methods. These tests provide success rates of urban navigation under different environmental conditions, such as landmark density, and show that the vehicle can navigate to a goal nearly 10 km away without any external pose update at all. Field tests validate these simulated results and demonstrate that, for given test conditions, an expected range can be determined for a given success rate.

     
    more » « less
  5. Image segmentation is a fundamental task that has benefited from recent advances in machine learning. One type of segmentation, of particular interest to computer vision, is that of urban segmentation. Although recent solutions have leveraged on deep neural networks, approaches usually do not consider regularities appearing in facade structures (e.g., windows are often in groups of similar alignment, size, or spacing patterns) as well as additional urban structures such as building footprints and roofs. Moreover, both satellite and street-view images are often noisy and occluded, thus getting the complete structure segmentation from a partial observation is difficult. Our key observations are that facades and other urban structures exhibit regular structures, and additional views are often available. In this paper, we present a novel framework (RFCNet) that consists of three modules to achieve multiple goals. Specifically, we propose Regularization to improve the regularities given an initial segmentation, Fusion that fuses multiple views of the segmentation, and Completion that can infer the complete structure if necessary. Experimental results show that our method outperforms previous state-of-the-art methods quantitatively and qualitatively for multiple facade datasets. Furthermore, by applying our framework to other urban structures (e.g., building footprints and roofs), we demonstrate our approach can be generalized to various pattern types. 
    more » « less