NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GOVERNOR: Smoother Stream Processing Through Smarter Backpressure

https://doi.org/10.1109/ICAC.2017.31

Chen, Xin; Vigfusson, Ymir; Blough, Douglas M.; Zheng, Fang; Wu, Kun-Lung; Hu, Liting (July 2017, 14th IEEE International Conference on Autonomous Computing)

Big data systems have evolved beyond scalable storage and rudimentary processing to supporting complex data analytics in near real-time, such as Apache Spark Streaming [31], Comet [14], Incremental Hadoop [17], MapReduce Online [7], Apache Storm [28], StreamScope [19], and IBM Streams [1]. These systems are particularly challenging to build owing to two requirements: low latency and fault tolerance. Many of the above systems evolved from a batch processing design and are thus architected to break down a steady stream of input events into a series of micro-batches and then perform batch-like computations on each successive micro-batch as a micro-batch job. In terms of latency, the systems are expected to respond to each micro-batch in seconds with an output The constant operation further entails that the systems must be robust to hardware, software and network-level failures. To incorporate fault-tolerance, the common approach is to use checkpointing and rollback recovery, whereby a streaming application periodically saves its in-memory state to persistent storage.
more » « less
Full Text Available
Popularity Prediction of Facebook Videos for Higher Quality Streaming

Tang, Linpeng; Huang, Qi; Puntambekar, Amit; Vigfusson, Ymir; Lloyd, Wyatt; Li, Kai (July 2017, USENIX Annual Technical Conference)

Streaming video algorithms dynamically select between different versions of a video to deliver the highest quality version that can be viewed without buffering over the client’s connection. To improve the quality for viewers, the backing video service can generate more and/or better versions, but at a significant computational overhead. Processing all videos uploaded to Facebook in the most intensive way would require a prohibitively large cluster. Facebook’s video popularity distribution is highly skewed, however, with analysis on sampled videos showing 1% of them accounting for 83% of the total watch time by users. Thus, if we can predict the future popularity of videos, we can focus the intensive processing on those videos that improve the quality of the most watch time. To address this challenge, we designed CHESS, the first popularity prediction algorithm that is both scalable and accurate. CHESS is scalable because, unlike the state-ofthe- art approaches, it requires only constant space per video, enabling it to handle Facebook’s video workload. CHESS is accurate because it delivers superior predictions using a combination of historical access patterns with social signals in a unified online learning framework. We have built a video prediction service, CHESSVPS, using our new algorithm that can handle Facebook’s workload with only four machines. We find that re-encoding popular videos predicted by CHESSVPS enables a higher percentage of total user watch time to benefit from intensive encoding, with less overhead than a recent production heuristic, e.g., 80% of watch time with one-third as much overhead.
more » « less
Full Text Available
Mithril: mining sporadic associations for cache prefetching

https://doi.org/10.1145/3127479.3131210

Yang, Juncheng; Karimi, Reza; Sæmundsson, Trausti; Wildani, Avani; Vigfusson, Ymir (January 2017, ACM Symposium on Cloud Computing)

The growing pressure on cloud application scalability has accentuated storage performance as a critical bottleneck. Although cache replacement algorithms have been extensively studied, cache prefetching - reducing latency by retrieving items before they are actually requested - remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose Mithril, a prefetching layer that efficiently exploits historical patterns in cache request associations. Mithril is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that Mithril is effective, giving an average of a 55% hit ratio increase over LRU and Probability Graph, and a 36% hit ratio gain over Amp at reasonable cost. Finally, we demonstrate the improvement comes from Mithril being able to capture mid-frequency blocks.
more » « less
Full Text Available
Enabling Space Elasticity in Storage Systems

https://doi.org/10.1145/2928275.2928291

Sigurbjarnarson, Helgi; Ragnarsson, Petur O.; Yang, Juncheng; Vigfusson, Ymir; Balakrishnan, Mahesh (June 2016, 9th ACM International Systems and Storage Conference)

Storage systems are designed to never lose data. However, modern applications increasingly use local storage to improve performance by storing soft state such as cached, prefetched or precomputed results. Required is elastic storage, where cloud providers can alter the storage footprint of applications by removing and regenerating soft state based on resource availability and access patterns. We propose a new abstraction called a motif that enables storage elasticity by allowing applications to describe how soft state can be regenerated. Carillon is a system that uses motifs to dynamically change the storage space used by applications. Carillon is implemented as a runtime and a collection of shim layers that interpose between applications and specific storage APIs; we describe shims for a filesystem (Carillon-FS) and a key-value store (Carillon-KV). We show that Carillon-FS allows us to dynamically alter the storage footprint of a VM, while Carillon-KV enables a graph database that accelerates performance based on available storage space
more » « less
Full Text Available

Search for: All records