NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Epidemic Spread Modeling for COVID-19 Using Cross-Fertilization of Mobility Data

https://doi.org/10.1109/TBDATA.2023.3248650

Schmedding, Anna; Pinciroli, Riccardo; Yang, Lishan; Smirni, Evgenia (October 2023, IEEE Transactions on Big Data)

Full Text Available
Lifespan and Failures of SSDs and HDDs: Similarities, Differences, and Prediction Models

https://doi.org/10.1109/TDSC.2021.3131571

Pinciroli, Riccardo; Yang, Lishan; Alter, Jacob; Smirni, Evgenia (January 2023, IEEE Transactions on Dependable and Secure Computing)

Data center downtime typically centers around IT equipment failure. Storage devices are the most frequently failing components in data centers. We present a comparative study of hard disk drives (HDDs) and solid state drives (SSDs) that constitute the typical storage in data centers. Using six-year field data of 100,000 HDDs of different models from the same manufacturer from the Backblaze dataset and six-year field data of 30,000 SSDs of three models from a Google data center, we characterize the workload conditions that lead to failures. We illustrate that their root failure causes differ from common expectations and that they remain difficult to discern. For the case of HDDs we observe that young and old drives do not present many differences in their failures. Instead, failures may be distinguished by discriminating drives based on the time spent for head positioning. For SSDs, we observe high levels of infant mortality and characterize the differences between infant and non-infant failures. We develop several machine learning failure prediction models that are shown to be surprisingly accurate, achieving high recall and low false positive rates. These models are used beyond simple prediction as they aid us to untangle the complex interaction of workload characteristics that lead to failures and identify failure root causes from monitored symptoms.
more » « less
Full Text Available
GeoSpread: an Epidemic Spread Modeling Tool for COVID-19 Using Mobility Data

https://doi.org/10.1145/3524458.3547257

Schmedding, Anna; Yang, Lishan; Pinciroli, Riccardo; Smirni, Evgenia (September 2022, GoodIT 2022: {ACM} International Conference on Information Technology for Social Good, Limassol, Cyprus, September 7 - 9, 2022)
Mourlas, cotas; Pacheco, Diego; Pandi, Catia (Ed.)
We present an individual-centric agent-based model and a flexible tool, GeoSpread, for studying and predicting the spread of viruses and diseases in urban settings. Using COVID-19 data collected by the Korean Center for Disease Control & Prevention (KCDC), we analyze patient and route data of infected people from January 20, 2020, to May 31, 2020, and discover how infection clusters develop as a function of time. This analysis offers a statistical characterization of population mobility and is used to parameterize GeoSpread to capture the spread of the disease. We validate simulation predictions from GeoSpread with ground truth and we evaluate different what-if counter-measure scenarios to illustrate the usefulness and flexibility of the tool for epidemic modeling.
more » « less
Full Text Available
Optimizing inference serving on serverless platforms

https://doi.org/10.14778/3547305.3547313

Ali, Ahsan; Pinciroli, Riccardo; Yan, Feng; Smirni, Evgenia (June 2022, Proceedings of the VLDB Endowment)

Serverless computing is gaining popularity for machine learning (ML) serving workload due to its autonomous resource scaling, easy to use and pay-per-use cost model. Existing serverless platforms work well for image-based ML inference, where requests are homogeneous in service demands. That said, recent advances in natural language processing could not fully benefit from existing serverless platforms as their requests are intrinsically heterogeneous. Batching requests for processing can significantly increase ML serving efficiency while reducing monetary cost, thanks to the pay-per-use pricing model adopted by serverless platforms. Yet, batching heterogeneous ML requests leads to additional computation overhead as small requests need to be "padded" to the same size as large requests within the same batch. Reaching effective batching decisions (i.e., which requests should be batched together and why) is non-trivial: the padding overhead coupled with the serverless auto-scaling forms a complex optimization problem. To address this, we develop Multi-Buffer Serving (MBS), a framework that optimizes the batching of heterogeneous ML inference serving requests to minimize their monetary cost while meeting their service level objectives (SLOs). The core of MBS is a performance and cost estimator driven by analytical models supercharged by a Bayesian optimizer. MBS is prototyped and evaluated on AWS using bursty workloads. Experimental results show that MBS preserves SLOs while outperforming the state-of-the-art by up to 8 x in terms of cost savings while minimizing the padding overhead by up to 37 x with 3 x less number of serverless function invocations.
more » « less
Full Text Available
CEDULE+: Resource Management for Burstable Cloud Instances Using Predictive Analytics

https://doi.org/10.1109/TNSM.2020.3039942

Pinciroli, Riccardo; Ali, Ahsan; Yan, Feng; Smirni, Evgenia (November 2020, IEEE Transactions on Network and Service Management)
null (Ed.)
Nearly all principal cloud providers now provide burstable instances in their offerings. The main attraction of this type of instance is that it can boost its performance for a limited time to cope with workload variations. Although burstable instances are widely adopted, it is not clear how to efficiently manage them to avoid waste of resources. In this paper, we use predictive data analytics to optimize the management of burstable instances. We design CEDULE+, a data-driven framework that enables efficient resource management for burstable cloud instances by analyzing the system workload and latency data. CEDULE+ selects the most profitable instance type to process incoming requests and controls CPU, I/O, and network usage to minimize the resource waste without violating Service Level Objectives (SLOs). CEDULE+ uses lightweight profiling and quantile regression to build a data-driven prediction model that estimates system performance for all combinations of instance type, resource type, and system workload. CEDULE+ is evaluated on Amazon EC2, and its efficiency and high accuracy are assessed through real-case scenarios. CEDULE+ predicts application latency with errors less than 10%, extends the maximum performance period of a burstable instance up to 2.4 times, and decreases deployment costs by more than 50%.
more » « less
Full Text Available
It's not a Sprint, it's a Marathon: Stretching Multi-resource Burstable Performance in Public Clouds

https://doi.org/10.1145/3366626.3368130

Ali, Ahsan; Pinciroli, Riccardo; Yan, Feng; Smirni, Evgenia (December 2019, Proceedings of the 20th International Middleware Conference (Middleware 2019))

During the past few years, all leading cloud providers introduced burstable instances that can sprint their performance for a limited period to address sudden workload variations. Despite the availability of burstable instances, there is no clear understanding of how to minimize the waste of resources by regulating their burst capacity to the workload requirements. This is especially true when it comes to non-CPU-intensive applications. In this paper, we investigate how to limit network and I/O usage to optimize the efficiency of the bursting process. We also study which resource shall be controlled to benefit both cloud providers and end-users. We design MRburst (Multi-Resource burstable performance scheduler) to automatically limit multiple resources (i.e., network, I/O, and CPU) and make the application comply with a user-defined service level objective (SLO) while minimizing wasted resources. MRburst is evaluated on Amazon EC2 using two multi-resource applications: an FTP server and a Ceph system. Experimental results show that MRburst outperforms state-of-the-art approaches by allowing instances to speed up their performance for up to 2.4 times longer period while meeting SLO.
more » « less
Full Text Available
CEDULE: A Scheduling Framework for Burstable Performance in Cloud Computing

https://doi.org/10.1109/ICAC.2018.00024

Ali, Ahsan; Pinciroli, Riccardo; Yan, Feng; Smirni, Evgenia (September 2018, 2018 IEEE International Conference on Autonomic Computing (ICAC))

Full Text Available

Search for: All records