NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms

https://doi.org/10.1109/ISCA59077.2024.00097

Bhuyan, Sandeepa; Ying, Ziyu; Kandemir, Mahmut T; Gowda, Mahanth; Das, Chita R (June 2024, IEEE)

Full Text Available
EdgePC: Efficient Deep Learning Analytics for Point Clouds on Edge Devices

https://doi.org/10.1145/3579371.3589113

Ying, Ziyu; Bhuyan, Sandeepa; Kang, Yan; Zhang, Yingtian; Kandemir, Mahmut T.; Das, Chita R. (January 2023, International Symposium on Computer Architecture 2023)

Recently, point cloud (PC) has gained popularity in modeling various 3D objects (including both synthetic and real-life) and has been extensively utilized in a wide range of applications such as AR/VR, 3D reconstruction, and autonomous driving. For such applications, it is critical to analyze/understand the surrounding scenes properly. To achieve this, deep learning based methods (e.g., convolutional neural networks (CNNs)) have been widely employed for higher accuracy. Unlike the deep learning on conventional 2D images/videos, where the feature computation (matrix multiplication) is the major bottleneck, in point cloud-based CNNs, the sample and neighbor search stages are the primary bottlenecks, and collectively contribute to 54% (up to 80%) of the overall execution latency on a typical edge device. While prior efforts have attempted to solve this issue by designing custom ASICs or pipelining the neighbor search with other stages, to our knowledge, none of them has tried to “structurize” the unstructured PC data for improving computational efficiency. In this paper, we first explore the opportunities of structurizing PC data using Morton code (which is originally designed to map data from a high dimensional space to one dimension, while preserving spatial locality) and observe that there is a huge scope to “skip” the sample and neighbor search computation by operating on the “structurized” PC data. Based on this, we propose two approximation techniques for the sampling and neighbor search stages. We implemented our proposals on an NVIDIA Jetson AGX Xavier edge GPU board. The evaluation results collected on six different workloads show that our design can accelerate the sample and neighbor search stages by 3.68× (up to 5.21×) with minimal impact on inference accuracy. This acceleration in turn results in 1.55× speedup in the end-to-end execution latency and saves 33% of energy expenditure.
more » « less
Full Text Available
Pushing Point Cloud Compression to the Edge

https://doi.org/10.1109/MICRO56248.2022.00031

Ying, Ziyu; Zhao, Shulin; Bhuyan, Sandeepa; Mishra, Cyan Subhra; Kandemir, Mahmut T.; Das, Chita R. (October 2022, International Symposium on Microarchitecture (MICRO) 2022)

As Point Clouds (PCs) gain popularity in processing millions of data points for 3D rendering in many applications, efficient data compression becomes a critical issue. This is because compression is the primary bottleneck in minimizing the latency and energy consumption of existing PC pipelines. Data compression becomes even more critical as PC processing is pushed to edge devices with limited compute and power budgets. In this paper, we propose and evaluate two complementary schemes, intra-frame compression and inter-frame compression, to speed up the PC compression, without losing much quality or compression efficiency. Unlike existing techniques that use sequential algorithms, our first design, intra-frame compression, exploits parallelism for boosting the performance of both geometry and attribute compression. The proposed parallelism brings around 43.7× performance improvement and 96.6% energy savings at a cost of 1.01× larger compressed data size. To further improve the compression efficiency, our second scheme, inter-frame compression, considers the temporal similarity among the video frames and reuses the attribute data from the previous frame for the current frame. We implement our designs on an NVIDIA Jetson AGX Xavier edge GPU board. Experimental results with six videos show that the combined compression schemes provide 34.0× speedup compared to a state-of-the-art scheme, with minimal impact on quality and compression ratio.
more » « less
Full Text Available
End-to-end Characterization of Game Streaming Applications on Mobile Platforms

https://doi.org/10.1145/3508030

Bhuyan, Sandeepa; Zhao, Shulin; Ying, Ziyu; Kandemir, Mahmut T.; Das, Chita R. (February 2022, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

With the advent of 5G, supporting high-quality game streaming applications on edge devices has become a reality. This is evidenced by a recent surge in cloud gaming applications on mobile devices. In contrast to video streaming applications, interactive games require much more compute power for supporting improved rendering (such as 4K streaming) with the stipulated frames-per second (FPS) constraints. This in turn consumes more battery power in a power-constrained mobile device. Thus, the state-of-the-art gaming applications suffer from lower video quality (QoS) and/or energy efficiency. While there has been a plethora of recent works on optimizing game streaming applications, to our knowledge, there is no study that systematically investigates the design pairs on the end-to-end game streaming pipeline across the cloud, network, and edge devices to understand the individual contributions of the different stages of the pipeline for improving the overall QoS and energy efficiency. In this context, this paper presents a comprehensive performance and power analysis of the entire game streaming pipeline consisting of the server/cloud side, network, and edge. Through extensive measurements with a high-end workstation mimicking the cloud end, an open-source platform (Moonlight-GameStreaming) emulating the edge device/mobile platform, and two network settings (WiFi and 5G) we conduct a detailed measurement-based study with seven representative games with different characteristics. We characterize the performance in terms of frame latency, QoS, bitrate, and energy consumption for different stages of the gaming pipeline. Our study shows that the rendering stage and the encoding stage at the cloud end are the bottlenecks to support 4K streaming. While 5G is certainly more suitable for supporting enhanced video quality with 4K streaming, it is more expensive in terms of power consumption compared to WiFi. Further, fluctuations in 5G network quality can lead to huge frame drops thus affecting QoS, which needs to be addressed by a coordinated design between the edge device and the server. Finally, the network interface and the decoder units in a mobile platform need more energy-efficient design to support high quality games at a lower cost. These observations should help in designing more cost-effective future cloud gaming platforms.
more » « less
Full Text Available
Deja View: Spatio-Temporal Compute Reuse for Energy-Efficient 360° VR Video Streaming

https://doi.org/10.1109/ISCA45697.2020.00030

Zhao, Shulin; Zhang, Haibo; Bhuyan, Sandeepa; Mishra, Cyan S.; Ying, Ziyu; Kandemir, Mahmut T.; Sivasubramaniam, Anand; Das, Chita R. (January 2020, International Symposium on Computer Architecture (ISCA))

Full Text Available

Search for: All records