skip to main content


Title: Sprocket: A Serverless Video Processing Framework
Sprocket is a highly configurable, stage-based, scalable, serverless video processing framework that exploits intra-video parallelism to achieve low latency. Sprocket enables developers to program a series of operations over video content in a modular, extensible manner. Programmers implement custom operations, ranging from simple video transformations to more complex computer vision tasks, in a simple pipeline specification language to construct custom video processing pipelines. Sprocket then handles the underlying access, encoding and decoding, and processing of video and image content across operations in a highly parallel manner. In this paper we describe the design and implementation of the Sprocket system on the AWS Lambda serverless cloud infrastructure, and evaluate Sprocket under a variety of conditions to show that it delivers its performance goals of high parallelism, low latency, and low cost (10s of seconds to process a 3,600 second video 1000-way parallel for less than $3).  more » « less
Award ID(s):
1763260 1629973 1553490 1564185
NSF-PAR ID:
10098946
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM Symposium on Cloud Computing (SoCC ’18)
Page Range / eLocation ID:
263 to 274
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Classified as a complex big data analytics problem, DNA short read alignment serves as a major sequential bottleneck to massive amounts of data generated by next-generation sequencing platforms. With Von-Neumann computing architectures struggling to address such computationally-expensive and memory-intensive task today, Processing-in-Memory (PIM) platforms are gaining growing interests. In this paper, an energy-efficient and parallel PIM accelerator (AlignS) is proposed to execute DNA short read alignment based on an optimized and hardware-friendly alignment algorithm. We first develop AlignS platform that harnesses SOT-MRAM as computational memory and transforms it to a fundamental processing unit for short read alignment. Accordingly, we present a novel, customized, highly parallel read alignment algorithm that only seeks the proposed simple and parallel in-memory operations (i.e. comparisons and additions). AlignS is then optimized through a new correlated data partitioning and mapping methodology that allows local storage and processing of DNA sequence to fully exploit the algorithm-level's parallelism, and to accelerate both exact and inexact matches. The device-to-architecture co-simulation results show that AlignS improves the short read alignment throughput per Watt per mm^2 by ~12X compared to the ASIC accelerator. Compared to recent FM-index-based ReRAM platform, AlignS achieves 1.6X higher throughput per Watt. 
    more » « less
  2. Serverless computing platforms have gained popularity because they allow easy deployment of services in a highly scalable and cost-effective manner. By enabling just-in-time startup of container-based services, these platforms can achieve good multiplexing and automatically respond to traffic growth, making them particularly desirable for edge cloud data centers where resources are scarce. Edge cloud data centers are also gaining attention because of their promise to provide responsive, low-latency shared computing and storage resources. Bringing serverless capabilities to edge cloud data centers must continue to achieve the goals of low latency and reliability. The reliability guarantees provided by serverless computing however are weak, with node failures causing requests to be dropped or executed multiple times. Thus serverless computing only provides a best effort infrastructure, leaving application developers responsible for implementing stronger reliability guarantees at a higher level. Current approaches for providing stronger semantics such as “exactly once” guarantees could be integrated into serverless platforms, but they come at high cost in terms of both latency and resource consumption. As edge cloud services move towards applications such as autonomous vehicle control that require strong guarantees for both reliability and performance, these approaches may no longer be sufficient. In this paper we evaluate the latency, throughput, and resource costs of providing different reliability guarantees, with a focus on these emerging edge cloud platforms and applications. 
    more » « less
  3. Serverless computing platforms have gained popularity because they allow easy deployment of services in a highly scalable and cost-effective manner. By enabling just-in-time startup of container-based services, these platforms can achieve good multiplexing and automatically respond to traffic growth, making them particularly desirable for edge cloud data centers where resources are scarce. Edge cloud data centers are also gaining attention because of their promise to provide responsive, low-latency shared computing and storage resources. Bringing serverless capabilities to edge cloud data centers must continue to achieve the goals of low latency and reliability. The reliability guarantees provided by serverless computing however are weak, with node failures causing requests to be dropped or executed multiple times. Thus serverless computing only provides a best effort infrastructure, leaving application developers responsible for implementing stronger reliability guarantees at a higher level. Current approaches for providing stronger semantics such as ``exactly once'' guarantees could be integrated into serverless platforms, but they come at high cost in terms of both latency and resource consumption. As edge cloud services move towards applications such as autonomous vehicle control that require strong guarantees for both reliability and performance, these approaches may no longer be sufficient. In this paper we evaluate the latency, throughput, and resource costs of providing different reliability guarantees, with a focus on these emerging edge cloud platforms and applications. 
    more » « less
  4. FPGAs are an excellent target for real-time video processing as they provide large amounts of low-level parallelism, low latency, and high bandwidth. However, creating real-time video processing systems on an FPGA is tedious and requires significant effort and low-level digital design skills. This paper presents a technique for creating complex real-time video processing pipelines relatively quickly and easily using partial reconfiguration (PR). A static FPGA system is created that provides a template for a variety of partially reconfigurable video processing cores. A library of video filters has been created that can be inserted into the template regions. At run-time, the user can select the topology of video cores and customize these cores to create complex and unique video pipelines without any understanding of low-level FPGA details. This paper demonstrates this technique with a library of 11 partial reconfigurable regions and 16 video processing cores operating on the Xilinx PYNQ system. 
    more » « less
  5. Processing large amounts of data, especially in learning algorithms, poses a challenge for current embedded computing systems. Hyperdimensional (HD) computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors called hypervectors . HDC replaces several complex learning computations with bitwise and simpler arithmetic operations at the expense of an increased amount of data due to mapping the data into high-dimensional space. These hypervectors, more often than not, cannot be stored in memory, resulting in long data transfers from storage. In this article, we propose Store-n-Learn, an in-storage computing solution that performs HDC classification and clustering by implementing encoding, training, retraining, and inference across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. We also present on-chip acceleration for HDC encoding in flash planes. This enables us to exploit the high parallelism provided by the flash hierarchy and encode multiple data points in parallel in both batched and non-batched fashion. Store-n-Learn also implements a single top-level FPGA accelerator with novel implementations for HDC classification training, retraining, inference, and clustering on the encoded data. Our evaluation over 10 popular datasets shows that Store-n-Learn is on average 222× (543×) faster than CPU and 10.6× (7.3×) faster than the state-of-the-art in-storage computing solution, INSIDER for HDC classification (clustering). 
    more » « less