Log-Structured Merge trees (LSM trees) are increasingly used as the storage engines behind several data systems, frequently deployed in the cloud. Similar to other database architectures, LSM trees consider information about the expected workload (e.g., reads vs. writes, point vs. range queries) to optimize their performance via tuning. However, operating in a shared infrastructure like the cloud comes with workload uncertainty due to the fast-evolving nature of modern applications. Systems with static tuning discount the variability of such hybrid workloads and hence provide an inconsistent and overall suboptimal performance. To address this problem, we introduce Endure - a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policies, size ratio, and memory allocation on the overall performance. Endure considers a robust formulation of the throughput maximization problem and recommends a tuning that maximizes the worst-case throughput over the neighborhood of each expected workload. Additionally, an uncertainty tuning parameter controls the size of this neighborhood, thereby allowing the output tunings to be conservative or optimistic. Through both model-based and extensive experimental evaluations of Endure in the state-of-the-art LSM-based storage engine, RocksDB, we show that the robust tuning methodology consistently outperforms classical tuning strategies. The robust tunings output by Endure lead up to a 5X improvement in throughput in the presence of uncertainty. On the flip side, Endure tunings have negligible performance loss when the observed workload exactly matches the expected one.
more »
« less
Towards flexibility and robustness of LSM trees
Abstract Log-structured merge trees (LSM trees) are increasingly used as part of the storage engine behind several data systems, and are frequently deployed in the cloud. As the number of applications relying on LSM-based storage backends increases, the problem of performance tuning of LSM trees receives increasing attention. We consider bothnominaltunings—where workload and execution environment are accurately known a priori—androbusttunings—which consideruncertaintyin the workload knowledge. This type of workload uncertainty is common in modern applications, notably in shared infrastructure environments like the public cloud. To address this problem, we introduceEndure, a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policy, size ratio, and memory allocation on the overall performance.Endureconsiders a robust formulation of the throughput maximization problem and recommends a tuning that offers near-optimal throughput when the executed workload is not the same, instead in aneighborhoodof the expected workload. Additionally, we explore the robustness of flexible LSM designs by proposing a new unified design called K-LSM that encompasses existing designs. We deploy our robust tuning system,Endure, on a state-of-the-art key-value store, RocksDB, and demonstrate throughput improvements of up to 5$$\times $$ in the presence of uncertainty. Our results indicate that the tunings obtained byEndureare more robust than tunings obtained under our expanded LSM design space. This indicates that robustness may not be inherent to a design, instead, it is an outcome of a tuning process that explicitly accounts for uncertainty.
more »
« less
- Award ID(s):
- 2144547
- PAR ID:
- 10485417
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- The VLDB Journal
- Volume:
- 33
- Issue:
- 4
- ISSN:
- 1066-8888
- Format(s):
- Medium: X Size: p. 1105-1128
- Size(s):
- p. 1105-1128
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$ ,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$ overNand requirements$$k_1,k_2,\ldots ,k_r$$ the goal is to find a minimum weight subset$$S \subseteq N$$ such that$$f_i(S) \ge k_i$$ for$$1 \le i \le r$$ . We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$ Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$ approximation where$$k = \sum _i k_i$$ and this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$ while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$ in the cost. Second, we consider the special case when each$$f_i$$ is a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.more » « less
-
A<sc>bstract</sc> A measurement of theCP-violating parameters in$$ {B}_s^0\boldsymbol{\to}{D}_s^{\mp }{K}^{\pm} $$ decays is reported, based on the analysis of proton-proton collision data collected by the LHCb experiment corresponding to an integrated luminosity of 6 fb−1at a centre-of-mass energy of 13 TeV. The measured parameters are obtained with a decay-time dependent analysis yieldingCf= 0.791 ± 0.061 ± 0.022,$$ {A}_f^{\Delta \Gamma} $$ = −0.051 ± 0.134 ± 0.058,$$ {A}_{\overline{f}}^{\Delta \Gamma} $$ = −0.303 ± 0.125 ± 0.055,Sf= −0.571 ± 0.084 ± 0.023 and$$ {S}_{\overline{f}} $$ = −0.503 ± 0.084 ± 0.025, where the first uncertainty is statistical and the second systematic. This corresponds to CP violation in the interference between mixing and decay of about 8.6σ. Together with the value of the$$ {B}_s^0 $$ mixing phase −2βs, these parameters are used to obtain a measurement of the CKM angleγequal to (74 ± 12)° modulo 180°, where the uncertainty contains both statistical and systematic contributions. This result is combined with the previous LHCb measurement in this channel using 3 fb−1resulting in a determination of$$ \gamma ={\left({81}_{-11}^{+12}\right)}^{\circ } $$ .more » « less
-
Abstract Rooted binarygalledtrees generalize rooted binary trees to allow a restricted class of cycles, known asgalls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees withnleaves to enumerate rooted binary unlabeled galled trees withnleaves, also enumerating rooted binary unlabeled galled trees withnleaves andggalls,$$0 \leqslant g \leqslant \lfloor \frac{n-1}{2} \rfloor $$ . The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binarynormalunlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for alln. We show that the number of rooted binary unlabeled galled trees grows with$$0.0779(4.8230^n)n^{-\frac{3}{2}}$$ , exceeding the growth$$0.3188(2.4833^n)n^{-\frac{3}{2}}$$ of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term,$$0.3910n^{\frac{1}{2}}$$ compared to$$0.3188n^{-\frac{3}{2}}$$ . For a fixed number of leavesn, the number of gallsgthat produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of$$g=0$$ and the maximum of$$g=\lfloor \frac{n-1}{2} \rfloor $$ . We discuss implications in mathematical phylogenetics.more » « less
-
A<sc>bstract</sc> We report measurements of thee+e−→$$ B\overline{B} $$ ,$$ B{\overline{B}}^{\ast } $$ , and$$ {B}^{\ast }{\overline{B}}^{\ast } $$ cross sections at four energies, 10653, 10701, 10746 and 10805 MeV, using data collected by the Belle II experiment. We reconstruct oneBmeson in a large number of hadronic final states and use its momentum to identify the production process. In the first 2 – 5 MeV above$$ {B}^{\ast }{\overline{B}}^{\ast } $$ threshold, thee+e−→$$ {B}^{\ast }{\overline{B}}^{\ast } $$ cross section increases rapidly. This may indicate the presence of a pole close to the threshold.more » « less
An official website of the United States government
