skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MANTIS: time-shifted prefetching of YouTube videos to reduce peak-time cellular data usage
The load on wireless cellular networks is not uniformly distributed through the day, and is significantly higher during peak periods. In this context, we present MANTIS, a time-shifted prefetching solution that prefetches content during off-peak periods of network connectivity. We specifically focus on YouTube given that it represents a significant portion of overall wireless data-usage. We make the following contributions: first, we collect and analyze a real-life dataset of YouTube watch history from 206 users comprised of over 1.8 million videos spanning over a 1-year period and present insights on a typical user's viewing behavior; second, we develop an accurate prediction algorithm using a K-nearest neighbor classifier approach; third, we evaluate the prefetching algorithm on two different datasets and show that MANTIS is able to reduce the traffic during peak periods by 34%; and finally, we develop a proof-of-concept prototype for MANTIS and perform a user study.  more » « less
Award ID(s):
1813242
PAR ID:
10173556
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
MMSys '20: Proceedings of the 11th ACM Multimedia Systems Conference
Page Range / eLocation ID:
112 to 125
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Internet traffic load is not uniformly distributed through the day; it is significantly higher during peak-periods, and comparatively idle during off-peak periods. In this context, we present CacheFlix, a time-shifted edge-caching solution that prefetches Netflix content during off-peak periods of network connectivity. We specifically focus on Netflix since it contributes to the largest percentage of global Internet traffic by a single application. We analyze a real-world dataset of Netflix viewing activity that we collected from 1060 users spanning a 1-year period and comprised of over 2.2 million Netflix TV shows and documentary series; we restrict the scope of our study to Netflix series that account for 65% of a typical user's Netflix load in terms of bytes fetched. We present insights on users' viewing behavior, and develop an accurate and efficient prediction algorithm using LSTM networks that caches episodes of Netflix series on storage constrained edge nodes, based on the user's past viewing activity. We evaluate CacheFlix on the collected dataset over various cache eviction policies, and find that CacheFlix is able to shift 70% of Netflix series traffic to off-peak hours. 
    more » « less
  2. Coded caching is a technique for reducing congestion in communication networks by prefetching content during idle periods and exploiting multicasting opportunities during periods of heavy traffic. Most of the existing research in this area has focused on minimizing the worst case (i.e., peak) rate in a broadcast link with multiple identically distributed user requests. However, modern content delivery networks are investing very heavily in profiling their users and predicting their preferences. The minimal achievable rate of a coded caching scheme with heterogeneous user profiles is still unknown in general. This paper presents the first steps towards solving that problem by analyzing the case of two users with distinct but overlapping demand sets. Specifically, it provides a complete characterization of the uniform-average-rate capacity when the sets overlap in just one file and shows that such capacity can be achieved with selfish and uncoded prefetching. Then, it characterizes the same capacity under selfish and uncoded prefetching when the demand sets overlap in two or more files. The paper also provides explicit prefetching schemes that achieve those capacities. All our results allow for arbitrary (and not necessarily identical) users’ cache sizes and number of files in each demand set. 
    more » « less
  3. YouTube is the most popular video sharing platform with more than 2 billion active users and 1 billion hours of video content watched daily. The dominance of YouTube has had a big impact on the performance of Internet protocols, algorithms, and systems. Understanding the interaction of users with YouTube is thus of much interest to the research community. In this context, we collect YouTube watch history data from 243 users spanning a 1.5 year period. The dataset comprises of a total of 1.8 million videos. We use the dataset to analyze and present key insights about user-level usage behavior. We also show that our analysis can be used by researchers to tackle a myriad of problems in the general domains of networking and communication. We present baseline characteristics and also substantiated directions to solve a few representative problems related to local caching techniques, prefetching strategies, the performance of YouTube's recommendation engine, the variability of user's video preferences and application specific load provisioning. 
    more » « less
  4. As video traffic dominates the Internet, it is important for operators to detect video Quality of Experience (QoE) in order to ensure adequate support for video traffic. With wide deployment of endto- end encryption, traditional deep packet inspection based traffic monitoring approaches are becoming ineffective. This poses a challenge for network operators to monitor user QoE and improve upon their experience. To resolve this issue, we develop and present a system for REal-time QUality of experience metric detection for Encrypted Traffic, Requet. Requet uses a detection algorithm we develop to identify video and audio chunks from the IP headers of encrypted traffic. Features extracted from the chunk statistics are used as input to a Machine Learning (ML) algorithm to predict QoE metrics, specifically, buffer warning (low buffer, high buffer), video state (buffer increase, buffer decay, steady, stall), and video resolution. We collect a large YouTube dataset consisting of diverse video assets delivered over various WiFi network conditions to evaluate the performance. We compare Requet with a baseline system based on previous work and show that Requet outperforms the baseline system in accuracy of predicting buffer low warning, video state, and video resolution by 1.12×, 1.53×, and 3.14×, respectively. 
    more » « less
  5. Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the context (and potentially the motivation) of an attack into consideration.We present TUBERAIDER, an attribution system achieving over 75% accuracy in detecting and attributing coordinated hate attacks on YouTube videos. We instantiate it using links to YouTube videos shared on 4chan's /pol/ board, r/The_Donald, and 16 Incels-related subreddits. We use a peak detector to identify a rise in the comment activity of a YouTube video, which signals that an attack may be occurring. We then train a machine learning classifier based on the community language (i.e., TF-IDF scores of relevant keywords) to perform the attribution. We test TUBERAIDER in the wild and present a few case studies of actual aggression attacks identified by it to showcase its effectiveness. 
    more » « less