skip to main content


Title: Real-Time 3D Object Detection and Recognition using a Smartphone [Real-Time 3D Object Detection and Recognition using a Smartphone]
Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone.  more » « less
Award ID(s):
1827505 2131186 1737533
NSF-PAR ID:
10346705
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 2nd International Conference on Image Processing and Vision Engineering - IMPROVE
Page Range / eLocation ID:
158 to 165
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes. 
    more » « less
  2. Artificial Intelligence (AI) developments in recent years have allowed several new types of applications to emerge. In particular, detecting people and objects from sequences of pictures or videos has been an exciting field of research. Even though there have been notable achievements with the emergence of sophisticated AI models, there needs to be a specialized research effort that helps people finding misplaced items from a set of video sequences. In this paper, we leverage voice recognition and Yolo (You Only Look Once) real-time object detection system to develop an AI-based solution that addresses this challenge. This solution assumes that previous recordings of the objects of interest and storing them in the dataset have already occurred. To find a misplaced object, the user delivers a voice command that is in turn fed into the Yolo model to detect where and when the searched object was seen last. The outcome of this process is a picture that is provided as evidence. We used Yolov7 for object detection thanks to its better accuracy and wider database while leveraging Google voice recognizer to translate the voice command into text. The initial results we obtained show a promising potential for the success of our approach. Our findings can be extended to be applied to various other scenarios ranging from detecting health risks for elderly people to assisting authorities in locating potential persons of interest. 
    more » « less
  3. Mobile headsets should be capable of understanding 3D physical environments to offer a truly immersive experience for augmented/mixed reality (AR/MR). However, their small form-factor and limited computation resources make it extremely challenging to execute in real-time 3D vision algorithms, which are known to be more compute-intensive than their 2D counterparts. In this paper, we propose DeepMix, a mobility-aware, lightweight, and hybrid 3D object detection framework for improving the user experience of AR/MR on mobile headsets. Motivated by our analysis and evaluation of state-of-the-art 3D object detection models, DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations that leverage depth data captured by headsets. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios. A unique feature of DeepMix is that it fully exploits the mobility of headsets to fine-tune detection results and boost detection accuracy. To the best of our knowledge, DeepMix is the first 3D object detection that achieves 30 FPS (i.e., an end-to-end latency much lower than the 100 ms stringent requirement of interactive AR/MR). We implement a prototype of DeepMix on Microsoft HoloLens and evaluate its performance via both extensive controlled experiments and a user study with 30+ participants. DeepMix not only improves detection accuracy by 9.1--37.3% but also reduces end-to-end latency by 2.68--9.15×, compared to the baseline that uses existing 3D object detection models. 
    more » « less
  4. Building embodied intelligent agents that can interact with 3D indoor environments has received increasing research attention in recent years. While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e.g., a switch on the wall turns on or off the light, a remote control operates the TV). Humans often spend little or no effort to infer these relationships, even when entering a new room, by using our strong prior knowledge (e.g., we know that buttons control electrical devices) or using only a few exploratory interactions in cases of uncertainty (e.g., multiple switches and lights in the same room). In this paper, we take the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes. We create a new benchmark based on the AI2Thor and PartNet datasets and perform extensive experiments that prove the effectiveness of our proposed method. Results show that our model successfully learns priors and fast-interactive-adaptation strategies for exploring inter-object functional relationships in complex 3D scenes. Several ablation studies further validate the usefulness of each proposed module. 
    more » « less
  5. A critical learning outcome of undergraduate engineering mechanics courses is the ability to understand how a structure's internal forces and bending moment will change in response to static and dynamic loads. One of the major challenges associated with both teaching and learning these concepts is the invisible nature of the internal effects. Although concentrated forces applied to the top of the beam can be easily visualized, observing the corresponding changes in the shear and bending moment diagrams is not a trivial task. Nonetheless, proficiency in this concept is vital for students to succeed in subsequent mechanics courses and, ultimately, as a professional practitioner. One promising technology that can enable students to see the invisible internal effects is augmented reality (AR), where virtual or digital objects can be seen through a device such as a smart phone or headset. This paper describes the proof-of-concept development of a Unity®-based AR application called "AR Stairs" that allows students to visualize (in-situ) the relative magnitude of the internal bending moment in an actual structure. The app is specifically tailored to an existing 40-foot long, 16-foot high steel staircase structure located at the authors' institution. This paper details the application design, analysis assumptions, calculations, technical challenges encountered, development environment, and content development. The key features of the app are discussed, which include: (a) coordinate system identification and placement, (b) automatic mapping of a stairs model in-situ, (c) creation of a virtual 2-dimensional staircase model, (d) object detection and tracking of people moving on the stairs, (e) image recognition to approximate people's weight, (f) overlays of virtual force vectors onto moving people, and (g) use of a chromatic scale to visually convey the relative intensity of the internal bending moment at nodes spaced over the length of the structure. It is the authors' intention to also provide the reader with an overall picture of the resources needed to develop AR applications for use in pedagogical settings, the design decision tradeoffs, and practical issues related to deployment. As AR technologies continually improve, they are expected to become an integral part of the pedagogical toolset used by engineering educators to improve the quality of education delivered to engineering students. 
    more » « less