Edge-assisted video analytics is gaining momentum. In this work, we tackle an important problem to compress video content live streamed from the device to the edge without scarifying accuracy and timeliness of its video analytics. We find that on-device processing can be tuned over a larger configuration space for more video compression, which was largely overlooked. Inspired by our pilot study, we design VPPlus to fulfill the potentials to compress the video as much as we can, while preserving analytical accuracy. VPPlus incorporates two core modules – offline profiling and online adaptation – to generate proper feedback automatically and quickly to tune on-device processing. We validate the effectiveness and efficiency of VPPlususing five object detection tasks over two popular datasets; VPPlus outperforms the state-of-art approaches in almost all the cases. 
                        more » 
                        « less   
                    
                            
                            Profiling-free Configuration Adaptation and Latency-Aware Resource Scheduling for Video Analytics
                        
                    
    
            With increasingly deployed cameras and the rapid advances of Computer Vision, large-scale live video analytics becomes feasible. However, analyzing videos is compute-intensive. In addition, live video analytics needs to be performed in real time. In this paper, we design an edge server system for live video analytics. We propose to perform configuration adaptation without profiling video online. We select configurations with a prediction model based on object movement features. In addition, we reduce the latency through resource orchestration on video analytics servers. The key idea of resource orchestration is to batch inference tasks that use the same CNN model, and schedule tasks based on a priority value that estimates their impact on the total latency. We evaluate our system with two video analytic applications, road traffic monitoring and pose detection. The experimental results show that our profiling-free adaptation reduces the workload by 80% of the state-of-the-art adaptation without lowering the accuracy. The average serving latency is reduced by up to 95% comparing with the profiling-based adaptation. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1908536
- PAR ID:
- 10465136
- Date Published:
- Journal Name:
- 2022 IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 1202 to 1211
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Serverless computing has become increasingly popular for cloud applications, due to its compelling properties of high-level abstractions, lightweight runtime, high elasticity and pay-per-use billing. In this revolutionary computing paradigm shift, challenges arise when adapting data analytics applications to the serverless environment, due to the lack of support for efficient state sharing, which attract ever-growing research attention. In this paper, we aim to exploit the advantages of task level orchestration and fine-grained resource provisioning for data analytics on serverless platforms, with the hope of fulfilling the promise of serverless deployment to the maximum extent. To this end, we present ACTS, an autonomous cost-efficient task orchestration framework for serverless analytics. ACTS judiciously schedules and coordinates function tasks to mitigate cold-start latency and state sharing overhead. In addition, ACTS explores the optimization space of fine-grained workload distribution and function resource configuration for cost efficiency. We have deployed and implemented ACTS on AWS Lambda, evaluated with various data analytics workloads. Results from extensive experiments demonstrate that ACTS achieves up to 98% monetary cost reduction while maintaining superior job completion time performance, in comparison with the state-of-the-art baselines.more » « less
- 
            Video analytics has many applications in traffic control, security monitoring, action/event analysis, etc. With the adoption of deep neural networks, the accuracy of video analytics in video streams has been greatly improved. However, deep neural networks for performing video analytics are compute-intensive. In order to reduce processing time, many systems switch to the lower frame rate or resolution. State-of-the-art switching approaches adjust configurations by profiling video clips on a large configuration space. Multiple configurations are tested periodically and the cheapest one with a desired accuracy is adopted. In this paper, we propose a method that adapts the configuration by analyzing past video analytics results instead of profiling candidate configurations. Our method adopts a lower/higher resolution or frame rate when objects move slow/fast. We train a model that automatically selects the best configuration. We evaluate our method with two real-world video analytics applications: traffic tracking and pose estimation. Compared to the periodic profiling method, our method achieves 3%-12% higher accuracy with the same resource cost and 8-17x faster with comparable accuracy.more » « less
- 
            Compute heterogeneity is increasingly gaining prominence in modern datacenters due to the addition of accelerators like GPUs and FPGAs. We observe that datacenter schedulers are agnostic of these emerging accelerators, especially their resource utilization footprints, and thus, not well equipped to dynamically provision them based on the application needs. We observe that the state-of-the-art datacenter schedulers fail to provide fine-grained resource guarantees for latency-sensitive tasks that are GPU-bound. Specifically for GPUs, this results in resource fragmentation and interference leading to poor utilization of allocated GPU resources. Furthermore, GPUs exhibit highly linear energy efficiency with respect to utilization and hence proactive management of these resources is essential to keep the operational costs low while ensuring the end-to-end Quality of Service (QoS) in case of user-facing queries.Towards addressing the GPU orchestration problem, we build Knots, a GPU-aware resource orchestration layer and integrate it with the Kubernetes container orchestrator to build Kube- Knots. Kube-Knots can dynamically harvest spare compute cycles through dynamic container orchestration enabling co-location of latency-critical and batch workloads together while improving the overall resource utilization. We design and evaluate two GPU-based scheduling techniques to schedule datacenter-scale workloads through Kube-Knots on a ten node GPU cluster. Our proposed Correlation Based Prediction (CBP) and Peak Prediction (PP) schemes together improves both average and 99 th percentile cluster-wide GPU utilization by up to 80% in case of HPC workloads. In addition, CBP+PP improves the average job completion times (JCT) of deep learning workloads by up to 36% when compared to state-of-the-art schedulers. This leads to 33% cluster-wide energy savings on an average for three different workloads compared to state-of-the-art GPU-agnostic schedulers. Further, the proposed PP scheduler guarantees the end-to-end QoS for latency-critical queries by reducing QoS violations by up to 53% when compared to state-of-the-art GPU schedulers.more » « less
- 
            The convergence of 5G wireless networks and edge computing enables new edge-native applications that are simultaneously bandwidth-hungry, latency-sensitive, and compute-intensive. Examples include deeply immersive augmented reality, wearable cognitive assistance, privacy-preserving video analytics, edge-triggered serendipity, and autonomous swarms of featherweight drones. Such edge-native applications require network-aware and load-aware orchestration of resources across the cloud (Tier-1), cloudlets (Tier-2), and device (Tier-3). This paper describes the architecture of Sinfonia, an open-source system for such cross-tier orchestration. Key attributes of Sinfonia include: support for multiple vendor-specific Tier-1 roots of orchestration, providing end-to-end runtime control that spans technical and non-technical criteria; use of third-party Kubernetes clusters as cloudlets, with unified treatment of telco-managed, hyperconverged, and just-in-time variants of cloudlets; masking of orchestration complexity from applications, thus lowering the barrier to creation of new edge-native applications. We describe an initial release of Sinfonia ( https://github.com/cmusatyalab/sinfonia ), and share our thoughts on evolving it in the future.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    