skip to main content


Title: UnityPIC: Unity Point-Cloud Interactive Core
In this work, we present Unity Point-Cloud Interactive Core, a novel interactive point cloud rendering pipeline for the Unity Development Platform. The goal of the proposed pipeline is to expedite the development process for point cloud applications by encapsulating the rendering process as a standalone component, while maintaining flexibility through an implementable interface. The proposed pipeline allows for rendering arbitrarily large point clouds with improved performance and visual quality. First, a novel dynamic batching scheme is proposed to address the adaptive point sizing problem for level-of-detail (LOD) point cloud structures. Then, an approximate rendering algorithm is proposed to reduce overdraw by minimizing the overall number of fragment operations through an intermediate occlusion culling pass. For the purpose of analysis, the visual quality of renderings is quantified and measured by comparing against a high-quality baseline. In the experiments, the proposed pipeline maintains above 90 FPS for a 20 million point budget while achieving greater than 90% visual quality during interaction when rendering a point-cloud with more than 20 billion points.  more » « less
Award ID(s):
1827505 1737533
NSF-PAR ID:
10286826
Author(s) / Creator(s):
; ; ;
Editor(s):
M. Hadwiger, M. Larsen
Date Published:
Journal Name:
Parallel graphics and visualisation
ISSN:
1727-348X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene both visually and aurally. Acuity isolates subjects from high-resolution input point clouds. It reduces the processing overhead by performing background subtraction at a coarse resolution, then applying the detected bounding boxes to fine-grained point clouds. Meanwhile, Acuity leverages an audiovisual sensor fusion approach to expedite sound source separation. The estimated object location in the visual domain guides the acoustic pipeline to isolate the subjects’ voices without running sound source localization. Our results demonstrate that Acuity can isolate multiple subjects’ high-quality point clouds with a maximum latency of 70 ms and average throughput of over 25 fps, while separating audio in less than 30 ms. We provide the source code of Acuity at: https://github.com/nesl/Acuity. 
    more » « less
  2. With the advent of 5G, supporting high-quality game streaming applications on edge devices has become a reality. This is evidenced by a recent surge in cloud gaming applications on mobile devices. In contrast to video streaming applications, interactive games require much more compute power for supporting improved rendering (such as 4K streaming) with the stipulated frames-per second (FPS) constraints. This in turn consumes more battery power in a power-constrained mobile device. Thus, the state-of-the-art gaming applications suffer from lower video quality (QoS) and/or energy efficiency. While there has been a plethora of recent works on optimizing game streaming applications, to our knowledge, there is no study that systematically investigates the design pairs on the end-to-end game streaming pipeline across the cloud, network, and edge devices to understand the individual contributions of the different stages of the pipeline for improving the overall QoS and energy efficiency. In this context, this paper presents a comprehensive performance and power analysis of the entire game streaming pipeline consisting of the server/cloud side, network, and edge. Through extensive measurements with a high-end workstation mimicking the cloud end, an open-source platform (Moonlight-GameStreaming) emulating the edge device/mobile platform, and two network settings (WiFi and 5G) we conduct a detailed measurement-based study with seven representative games with different characteristics. We characterize the performance in terms of frame latency, QoS, bitrate, and energy consumption for different stages of the gaming pipeline. Our study shows that the rendering stage and the encoding stage at the cloud end are the bottlenecks to support 4K streaming. While 5G is certainly more suitable for supporting enhanced video quality with 4K streaming, it is more expensive in terms of power consumption compared to WiFi. Further, fluctuations in 5G network quality can lead to huge frame drops thus affecting QoS, which needs to be addressed by a coordinated design between the edge device and the server. Finally, the network interface and the decoder units in a mobile platform need more energy-efficient design to support high quality games at a lower cost. These observations should help in designing more cost-effective future cloud gaming platforms. 
    more » « less
  3. An accurate understanding of omnidirectional environment lighting is crucial for high-quality virtual object rendering in mobile augmented reality (AR). In particular, to support reflective rendering, existing methods have leveraged deep learning models to estimate or have used physical light probes to capture physical lighting, typically represented in the form of an environment map. However, these methods often fail to provide visually coherent details or require additional setups. For example, the commercial framework ARKit uses a convolutional neural network that can generate realistic environment maps; however the corresponding reflective rendering might not match the physical environments. In this work, we present the design and implementation of a lighting reconstruction framework called LITAR that enables realistic and visually-coherent rendering. LITAR addresses several challenges of supporting lighting information for mobile AR. First, to address the spatial variance problem, LITAR uses two-field lighting reconstruction to divide the lighting reconstruction task into the spatial variance-aware near-field reconstruction and the directional-aware far-field reconstruction. The corresponding environment map allows reflective rendering with correct color tones. Second, LITAR uses two noise-tolerant data capturing policies to ensure data quality, namely guided bootstrapped movement and motion-based automatic capturing. Third, to handle the mismatch between the mobile computation capability and the high computation requirement of lighting reconstruction, LITAR employs two novel real-time environment map rendering techniques called multi-resolution projection and anchor extrapolation. These two techniques effectively remove the need of time-consuming mesh reconstruction while maintaining visual quality. Lastly, LITAR provides several knobs to facilitate mobile AR application developers making quality and performance trade-offs in lighting reconstruction. We evaluated the performance of LITAR using a small-scale testbed experiment and a controlled simulation. Our testbed-based evaluation shows that LITAR achieves more visually coherent rendering effects than ARKit. Our design of multi-resolution projection significantly reduces the time of point cloud projection from about 3 seconds to 14.6 milliseconds. Our simulation shows that LITAR, on average, achieves up to 44.1% higher PSNR value than a recent work Xihe on two complex objects with physically-based materials. 
    more » « less
  4. We describe an improvement to the recently developed view independent rendering (VIR), and apply it to dynamic cube-mapped reflections. Standard multiview rendering (MVR) renders a scene six times for each cube map. VIR traverses the geometry once per frame to generate a point cloud optimized to many cube maps, using it to render reflected views in parallel. Our improvement, eye-resolution point rendering (EPR), is faster than VIR and makes cube maps faster than MVR, with comparable visual quality. We are currently improving EPR’s run time by reducing point cloud size and per-point processing. 
    more » « less
  5. The recent explosion in the number and size of spatio-temporal data sets from urban environments and social sensors creates new opportunities for data-driven approaches to understand and improve cities. Visual analytics systems like Urbane aim to empower domain experts to explore multiple data sets, at different time and space resolutions. Since these systems rely on computationally-intensive spatial aggregation queries that slice and summarize the data over different regions, an important challenge is how to attain interactivity. While traditional pre-aggregation approaches support interactive exploration, they are unsuitable in this setting because they do not support ad-hoc query constraints or polygons of arbitrary shapes. To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU). By doing so, Raster Join evaluates queries on the fly at interactive speeds on commodity laptops and desktops. In this demonstration, we showcase the efficiency of Raster Join by integrating it with Urbane and enabling interactivity. Demo visitors will interact with Urbane to filter and visualize several urban data sets over multiple resolutions. 
    more » « less