A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection.
more »
« less
Real-Time 3D Object Detection and Recognition using a Smartphone [Real-Time 3D Object Detection and Recognition using a Smartphone]
Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone.
more »
« less
- PAR ID:
- 10346705
- Date Published:
- Journal Name:
- Proceedings of the 2nd International Conference on Image Processing and Vision Engineering - IMPROVE
- Page Range / eLocation ID:
- 158 to 165
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes.more » « less
-
Cloud computing infrastructures have become the de-facto platform for data driven machine learning applications. However, these centralized models of computing are unqualified for dispersed high volume real-time edge data intensive applications such as real time object detection, where video streams may be captured at multiple geographical locations. While many recent advancements in object detection have been made using Convolutional Neural Networks but these performance improvements only focus on a single contiguous object detection model. In this paper, we propose a distributed Edge-Cloud R-CNN by splitting the model into components and dynamically distributing these components in the cloud for optimal performance for real time object detection. As a proof of concept, we evaluate the performance of the proposed system on a distributed computing platform encompasses cloud servers and edge embedded devices for real-time object detection on video streams.more » « less
-
Artificial Intelligence (AI) developments in recent years have allowed several new types of applications to emerge. In particular, detecting people and objects from sequences of pictures or videos has been an exciting field of research. Even though there have been notable achievements with the emergence of sophisticated AI models, there needs to be a specialized research effort that helps people finding misplaced items from a set of video sequences. In this paper, we leverage voice recognition and Yolo (You Only Look Once) real-time object detection system to develop an AI-based solution that addresses this challenge. This solution assumes that previous recordings of the objects of interest and storing them in the dataset have already occurred. To find a misplaced object, the user delivers a voice command that is in turn fed into the Yolo model to detect where and when the searched object was seen last. The outcome of this process is a picture that is provided as evidence. We used Yolov7 for object detection thanks to its better accuracy and wider database while leveraging Google voice recognizer to translate the voice command into text. The initial results we obtained show a promising potential for the success of our approach. Our findings can be extended to be applied to various other scenarios ranging from detecting health risks for elderly people to assisting authorities in locating potential persons of interest.more » « less
-
Mobile headsets should be capable of understanding 3D physical environments to offer a truly immersive experience for augmented/mixed reality (AR/MR). However, their small form-factor and limited computation resources make it extremely challenging to execute in real-time 3D vision algorithms, which are known to be more compute-intensive than their 2D counterparts. In this paper, we propose DeepMix, a mobility-aware, lightweight, and hybrid 3D object detection framework for improving the user experience of AR/MR on mobile headsets. Motivated by our analysis and evaluation of state-of-the-art 3D object detection models, DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations that leverage depth data captured by headsets. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios. A unique feature of DeepMix is that it fully exploits the mobility of headsets to fine-tune detection results and boost detection accuracy. To the best of our knowledge, DeepMix is the first 3D object detection that achieves 30 FPS (i.e., an end-to-end latency much lower than the 100 ms stringent requirement of interactive AR/MR). We implement a prototype of DeepMix on Microsoft HoloLens and evaluate its performance via both extensive controlled experiments and a user study with 30+ participants. DeepMix not only improves detection accuracy by 9.1--37.3% but also reduces end-to-end latency by 2.68--9.15×, compared to the baseline that uses existing 3D object detection models.more » « less