NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Citadel: Protecting Data Privacy and Model Confidentiality for Collaborative Learning

https://doi.org/10.1145/3472883.3486998

Zhang, Chengliang; Xia, Junzhe; Yang, Baichen; Puyang, Huancheng; Wang, Wei; Chen, Ruichuan; Akkus, Istemi Ekin; Aditya, Paarijaat; Yan, Feng (November 2021, ACM Symposium on Cloud Computing 2021 (SoCC 2021))

Full Text Available
Enabling Cost-Effective, SLO-Aware Machine Learning Inference Serving on Public Cloud

https://doi.org/10.1109/TCC.2020.3006751

Zhang, Chengliang; Yu, Minchen; Wang, Wei; Yan, Feng (July 2020, IEEE transactions on cloud computing)

Full Text Available
BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning

Zhang, Chengliang; Li, Suyi; Xia, Junzhe; Wang, Wei; Yan, Feng; Liu, Yang (April 2020, Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 2020))

Full Text Available
MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving

Zhang, Chengliang; Yu, Minchen; Wang, Wei; Yan, Feng (May 2019, Proceedings of the USENIX Conference)

The advances of Machine Learning (ML) have sparked a growing demand of ML-as-a-Service: developers train ML models and publish them in the cloud as online services to provide low-latency inference at scale. The key challenge of ML model serving is to meet the response-time Service-Level Objectives (SLOs) of inference workloads while minimizing the serving cost. In this paper, we tackle the dual challenge of SLO compliance and cost effectiveness with MArk (Model Ark), a general-purpose inference serving system built in Amazon Web Services (AWS). MArk employs three design choices tailor-made for inference workload. First, MArk dynamically batches requests and opportunistically serves them using expensive hardware accelerators (e.g., GPU) for improved performance-cost ratio. Second, instead of relying on feedback control scaling or over-provisioning to serve dynamic workload, which can be too slow or too expensive for inference serving, MArk employs predictive autoscaling to hide the provisioning latency at low cost. Third, given the stateless nature of inference serving, MArk exploits the flexible, yet costly serverless instances to cover the occasional load spikes that are hard to predict. We evaluated the performance of MArk using several state-of-the-art ML models trained in popular frameworks including TensorFlow, MXNet, and Keras. Compared with the premier industrial ML serving platform SageMaker, MArk reduces the serving cost up to 7.8× while achieving even better latency performance.
more » « less
Full Text Available
Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning

https://doi.org/10.1109/ICDCS.2018.00020

Zhang, Chengliang; Tian, Huangshi; Wang, Wei; Yan, Feng (July 2018, 38th IEEE International Conference on Distributed Computing Systems (ICDCS 2018))

Full Text Available

Search for: All records