null
(Ed.)
Saving energy for latency-critical applications like
web search can be challenging because of their strict tail latency
constraints. State-of-the-art power management frameworks use
Dynamic Voltage and Frequency Scaling (DVFS) and Sleep states
techniques to slow down the request processing and finish the
search just-in-time. However, accurately predicting the compute
demand of a request can be difficult. In this paper, we present
Gemini, a novel power management framework for latency-
critical search engines. Gemini has two unique features to
capture the per query service time variation. First, at light loads
without request queuing, a two-step DVFS is used to manage
the CPU power. Our two-step DVFS selects the initial CPU
frequency based on the query specific service time prediction
and then judiciously boosts the initial frequency at the right
time to catch-up to the deadline. The determination of boosting
time further relies on estimating the error in the prediction of
individual query’s service time. At high loads, where there is
request queuing, only the current request being executed and
the critical request in the queue adopt a two-step DVFS. All the
other requests in-between use the same frequency to reduce the
frequency transition overhead. Second, we develop two separate
neural network models, one for predicting the service time and
the other for the error in the prediction. The combination of
these two predictors significantly improves the power saving and
tail latency results of our two-step DVFS. Gemini is implemented
on the Solr search engine. Evaluations on three representative
query traces show that Gemini saves 41% of the CPU power,
and is better than other state-of-the-art techniques.
more »
« less