Serverless computing is a new pay-per-use cloud service paradigm that automates resource scaling for stateless functions and can potentially facilitate bursty machine learning serving. Batching is critical for latency performance and cost-effectiveness of machine learning inference, but unfortunately it is not supported by existing serverless platforms due to their stateless design. Our experiments show that without batching, machine learning serving cannot reap the benefits of serverless computing. In this paper, we present BATCH, a framework for supporting efficient machine learning serving on serverless platforms. BATCH uses an optimizer to provide inference tail latency guarantees and cost optimization and to enable adaptive batching support. We prototype BATCH atop of AWS Lambda and popular machine learning inference systems. The evaluation verifies the accuracy of the analytic optimizer and demonstrates performance and cost advantages over the state-of-the-art method MArk and the state-of-the-practice tool SageMaker.
more »
« less
SandPiper: A Cost-Efficient Adaptive Framework for Online Recommender Systems
Online recommender systems have proven to have ubiquitous applications in various domains. To provide accurate recommendations in real time it is imperative to constantly train and deploy models with the latest data samples. This retraining involves adjusting the model weights by incorporating newly-arrived streaming data into the model to bridge the accuracy gap. To provision resources for the retraining, typically the compute is hosted on VMs, however, due to the dynamic nature of the data arrival patterns, stateless functions would be an ideal alternative over VMs, as they can instantaneously scale on demand. However, it is non-trivial to statically configure the stateless functions because the model retraining exhibits varying resource needs during different phases of retraining. Therefore, it is crucial to dynamically configure the functions to meet the resource requirements, while bridging the accuracy gap. In this paper, we propose Sandpiper, an adaptive framework that leverages stateless functions to deliver accurate predictions at low cost for online recommender systems. The three main ideas in Sandpiper are (i) we design a data-drift monitor that automatically triggers model retraining at required time intervals to bridge the accuracy gap due to incoming data drifts; (ii) we develop an online configuration model that selects the appropriate function configurations while maintaining the model serving accuracy within the latency and cost budget; and (iii) we propose a dynamic synchronization policy for stateless functions to speed up the distributed model retraining leading to cloud cost minimization. A prototype implementation on AWS shows that Sandpiper maintains the average accuracy above 90%, while 3.8× less expensive than the traditional VM-based schemes.
more »
« less
- Award ID(s):
- 2116962
- PAR ID:
- 10462581
- Date Published:
- Journal Name:
- 2022 IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 423 to 430
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data centers and clouds are increasingly offering low-cost computational resources in the form of transient virtual machines. Whenever demand for computational resources exceeds their availability, transient resources can reclaimed by preempting the transient VMs. Conventionally, these transient VMs are used by low-priority applications that can tolerate the disruption caused by preemptions. In this paper we propose an alternative approach for reclaiming resources, called resource deflation. Resource deflation allows applications to dynamically shrink (and expand) in response to resource pressure, instead of being preempted outright. Deflatable VMs allow applications to continue running even under resource pressure, and increase the utility of low-priority transient resources. Deflation uses a dynamic, multi-level cascading reclamation technique that allows applications, operating systems, and hypervisors to implement their own policies for handling resource pressure. For distributed data processing, machine learning, and deep neural network training, our multi-level approach reduces the performance degradation by up to 2x compared to existing preemption-based approaches. When deflatable VMs are deployed on a cluster, our policies allow up to 1.6x utilization without the risk of preemption.more » « less
-
Abstract Predictive Maintenance (PdM) emerges as a critical task of Industry 4.0, driving operational efficiency, minimizing downtime, and reducing maintenance costs. However, real-world industrial environments present unsolved challenges, especially in predicting simultaneous and correlated faults under evolving conditions. Traditional batch-based and deep learning approaches for simultaneous fault prediction often fall short due to their assumptions of static data distributions and high computational demands, making them unsuitable for dynamic, resource-constrained systems. In response, we propose OEMLHAT (Online Ensemble of Multi-Label Hoeffding Adaptive Trees), a novel model tailored for real-time, multi-label fault prediction in non-stationary industrial settings. OEMLHAT introduces a scalable online ensemble architecture that integrates online bagging, dynamic feature subspacing, and adaptive output weighting. This design allows it to efficiently handle concept drift, high-dimensional input spaces, and label sparsity, key bottlenecks in existing PdM solutions. Experimental results on three public multi-label PdM case studies demonstrate substantial improvements in predictive performance of OEMLHAT over previous batch-based and online proposals for multi-label classification, particularly with an average improvement in micro-averaged F1-score of 18.49% over the second most-accurate batch-based proposal and of 8.56% in the case of the second best online model. By addressing a critical gap in online multi-label learning for PdM, this work provides a robust and interpretable solution for next-generation industrial monitoring systems for fault detection, particularly for rare and concurrent failures.more » « less
-
Online dating platforms have gained widespread popularity as a means for individuals to seek potential romantic relationships. While recommender systems have been designed to improve the user experience in dating platforms by providing personalized recommendations, increasing concerns about fairness have encouraged the development of fairness-aware recommender systems from various perspectives (e.g., gender and race). However, sexual orientation, which plays a significant role in finding a satisfying relationship, is under-investigated. To fill this crucial gap, we propose a novel metric, Opposite Gender Interaction Ratio (OGIR), as a way to investigate potential unfairness for users with varying preferences towards the opposite gender. We empirically analyze a real online dating dataset and observe existing recommender algorithms could suffer from group unfairness according to OGIR. We further investigate the potential causes for such gaps in recommendation quality, which lead to the challenges of group quantity imbalance and group calibration imbalance. Ultimately, we propose a fair recommender system based on re-weighting and re-ranking strategies to respectively mitigate these associated imbalance challenges. Experimental results demonstrate both strategies improve fairness while their combination achieves the best performance towards maintaining model utility while improving fairness.more » « less
-
Multivariate spatial point process models can describe heterotopic data over space. However, highly multivariate intensities are computationally challenging due to the curse of dimensionality. To bridge this gap, we introduce a declustering based hidden variable model that leads to an efficient inference procedure via a variational autoencoder (VAE). We also prove that this model is a generalization of the VAE-based model for collaborative filtering. This leads to an interesting application of spatial point process models to recommender systems. Experimental results show the method’s utility on both synthetic data and real-world data sets.more » « less
An official website of the United States government

