skip to main content


Search for: All records

Award ID contains: 1815115

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14x compared to using frame-level or spatial-temporal similarity built into existing query systems. 
    more » « less
  2. null (Ed.)
    AI applications powered by deep learning inference are increasingly run natively on edge devices to provide better interactive user experience. This often necessitates fitting a model originally designed and trained in the cloud to edge devices with a range of hardware capabilities, which so far has relied on time-consuming manual effort. In this paper, we quantify the challenges of manually generating a large number of compressed models and then build a system framework, Mistify, to automatically port a cloud-based model to a suite of models for edge devices targeting various points in the design space. Mistify adds an intermediate “layer” that decouples the model design and deployment phases. By exposing configuration APIs to obviate the need for code changes deeply embedded into the original model, Mistify hides run-time issues from model designers and hides the model internals from model users, hence reducing the expertise needed in either. For better scalability, Mistify consolidates multiple model tailoring requests to minimize repeated computation. Further, Mistify leverages locally available edge data in a privacy-aware manner, and performs run-time model adaptation to provide scalable edge support and accurate inference results. Extensive evaluation shows that Mistify reduces the DNN porting time needed by over 10x to cater to a wide spectrum of edge deployment scenarios, incurring orders of magnitude less manual effort. 
    more » « less
  3. Despite extensive investigation of job scheduling in data-intensive computation frameworks, less consideration has been given to optimizing job partitioning for resource utilization and efficient processing. Instead, partitioning and job sizing are a form of dark art, typically left to developer intuition and trial-and-error style experimentation. In this work, we propose that just as job scheduling and resource allocation are out-sourced to a trusted mechanism external to the workload, so too should be the responsibility for partitioning data as a determinant for task size. Job partitioning essentially involves determining the partition sizes to match the resource allocation at the finest granularity. This is a complex, multi-dimensional problem that is highly application specific: resource allocation, computational runtime, shuffle and reduce communication requirements, and task startup overheads all have strong influence on the most effective task size for efficient processing. Depending on the partition size, the job completion time can differ by as much as 10 times! Fortunately, we observe a general trend underlying the tradeoff between full resource utilization and system overhead across different settings. The optimal job partition size balances these two conflicting forces. Given this trend, we design Libra to automate job partitioning as a framework extension. We integrate Libra with Spark and evaluate its performance on EC2. Compared to state-of-the-art techniques, Libra can reduce the individual job execution time by 25% to 70%. 
    more » « less
  4. Mobile applications have become increasingly sophisticated. Emerging cognitive assistance applications can involve multiple computationally intensive modules working continuously and concurrently, further straining the already limited resources on these mobile devices. While computation offloading to the edge or the cloud is still the de facto solution, existing approaches are limited by intra-application operations only or edge-/cloud-centric scheduling. Instead, we argue that operating system level coordination is needed on the mobile side to adequately support the prospects of multi-application offloading. Specifically, both the local mobile system resource and the network bandwidth to reach the cloud need to be allocated intelligently among concurrent offloading jobs. In this paper, we build a system-level scheduler service, LinkShare, that wraps over the operating system scheduler to coordinate among multiple offloading requests. We further study the scheduling requirements and suitable metrics, and find that the most intuitive approaches of minimizing the end-to- end processing time or earliest-deadline first scheduling do not work well. Instead, LinkShare adopts earliest-deadline first with limited sharing (EDF-LS), that balances real-time requirements and fairness. Extensive evaluation of an Android implementation of LinkShare shows that adding this additional scheduler is essential, and that EDF-LS reduces the deadline miss events by up to 30% compared to the baseline. 
    more » « less
  5. Mobile and IoT scenarios increasingly involve interactive and computation intensive contextual recognition. Existing optimizations typically resort to computation offloading or simplified on-device processing. Instead, we observe that the same application is often invoked on multiple devices in close proximity. Moreover, the application instances often process similar contextual data that map to the same outcome. In this paper, we propose cross-device approximate computation reuse, which minimizes redundant computation by harnessing the “equivalence” between different input values and reusing previously computed outputs with high confidence. We devise adaptive locality sensitive hashing (A-LSH) and homogenized k nearest neighbors (H-kNN). The former achieves scalable and constant lookup, while the latter provides high-quality reuse and tunable accuracy guarantee. We further incorporate approximate reuse as a service, called FoggyCache, in the computation offloading runtime. Extensive evaluation shows that, when given 95% accuracy target, FoggyCache consistently harnesses over 90% of reuse opportunities, which translates to reduced computation latency and energy consumption by a factor of 3 to 10. 
    more » « less