skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-Time 3D Object Detection and Recognition using a Smartphone [Real-Time 3D Object Detection and Recognition using a Smartphone]
Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone.  more » « less
Award ID(s):
1827505 2131186 1737533
PAR ID:
10346705
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 2nd International Conference on Image Processing and Vision Engineering - IMPROVE
Page Range / eLocation ID:
158 to 165
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection. 
    more » « less
  2. This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes. 
    more » « less
  3. Cloud computing infrastructures have become the de-facto platform for data driven machine learning applications. However, these centralized models of computing are unqualified for dispersed high volume real-time edge data intensive applications such as real time object detection, where video streams may be captured at multiple geographical locations. While many recent advancements in object detection have been made using Convolutional Neural Networks but these performance improvements only focus on a single contiguous object detection model. In this paper, we propose a distributed Edge-Cloud R-CNN by splitting the model into components and dynamically distributing these components in the cloud for optimal performance for real time object detection. As a proof of concept, we evaluate the performance of the proposed system on a distributed computing platform encompasses cloud servers and edge embedded devices for real-time object detection on video streams. 
    more » « less
  4. Artificial Intelligence (AI) developments in recent years have allowed several new types of applications to emerge. In particular, detecting people and objects from sequences of pictures or videos has been an exciting field of research. Even though there have been notable achievements with the emergence of sophisticated AI models, there needs to be a specialized research effort that helps people finding misplaced items from a set of video sequences. In this paper, we leverage voice recognition and Yolo (You Only Look Once) real-time object detection system to develop an AI-based solution that addresses this challenge. This solution assumes that previous recordings of the objects of interest and storing them in the dataset have already occurred. To find a misplaced object, the user delivers a voice command that is in turn fed into the Yolo model to detect where and when the searched object was seen last. The outcome of this process is a picture that is provided as evidence. We used Yolov7 for object detection thanks to its better accuracy and wider database while leveraging Google voice recognizer to translate the voice command into text. The initial results we obtained show a promising potential for the success of our approach. Our findings can be extended to be applied to various other scenarios ranging from detecting health risks for elderly people to assisting authorities in locating potential persons of interest. 
    more » « less
  5. Martín-Sacristán, David; Garcia-Roger, David (Ed.)
    With the recent 5G communication technology deployment, Cellular Vehicle-to-Everything (C-V2X) significantly enhances road safety by enabling real-time exchange of critical traffic information among vehicles, pedestrians, infrastructure, and networks. However, further research is required to address real-time application latency and communication reliability challenges. This paper explores integrating cutting-edge C-V2X technology with environmental perception systems to enhance safety at intersections and crosswalks. We propose a multi-module architecture combining C-V2X with state-of-the-art perception technologies, GPS mapping methods, and the client–server module to develop a co-operative perception system for collision avoidance. The proposed system includes the following: (1) a hardware setup for C-V2X communication; (2) an advanced object detection module leveraging Deep Neural Networks (DNNs); (3) a client–server-based co-operative object detection framework to overcome computational limitations of edge computing devices; and (4) a module for mapping GPS coordinates of detected objects, enabling accurate and actionable GPS data for collision avoidance—even for detected objects not equipped with C-V2X devices. The proposed system was evaluated through real-time experiments at the GMMRC testing track at Kettering University. Results demonstrate that the proposed system enhances safety by broadcasting critical obstacle information with an average latency of 9.24 milliseconds, allowing for rapid situational awareness. Furthermore, the proposed system accurately provides GPS coordinates for detected obstacles, which is essential for effective collision avoidance. The technology integration in the proposed system offers high data rates, low latency, and reliable communication, which are key features that make it highly suitable for C-V2X-based applications. 
    more » « less