skip to main content


Title: DoubleFaceAD: A New Datastore Driver Architecture to Optimize Fanout Query Performance
The broad adoption of fanout queries on distributed datastores has made asynchronous event-driven datastore drivers a natural choice due to reduced multithreading overhead. However, through extensive experiments using the latest datastore drivers (e.g., MongoDB, HBase, DynamoDB) and YCSB benchmark, we show that an asynchronous datastore driver can cause unexpected performance degradation, especially in fanout-query scenarios. For example, the default MongoDB asynchronous driver adopts the latest Java asynchronous I/O library, which uses a hidden on-demand JVM level thread pool to process fanout query responses, causing a surprising multithreading overhead when the query response size is large. A second instance is the traditional wisdom of modular design of an application server and the embedded asynchronous datastore driver can cause an imbalanced workload between the two components due to lack of coordination, incurring frequent unnecessary system calls. To address the revealed problems, we introduce DoubleFaceAD--a new asynchronous datastore driver architecture that integrates the management of both upstream and downstream workload traffic through a few shared reactor threads, with fanout-query-aware priority-based scheduling to reduce the overall query waiting time. Our experimental results on two representative application scenarios (YCSB and DBLP) show DoubleFaceAD outperforms all other types of datastore drivers up to 34% on throughput and 1.9\texttimes{} faster on 99th percentile response time.  more » « less
Award ID(s):
2000681
NSF-PAR ID:
10212863
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 21st International Middleware Conference
Page Range / eLocation ID:
430 to 444
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mission-critical, real-time, continuous stream processing applications that interact with the real world have stringent latency requirements. For example, e-commerce websites like Amazon improve their marketing strategy by performing real-time advertising based on customers' behavior, and latency long tail can cause significant revenue loss. Recent work [39] showed a positive correlation between latency long tail and variance in the execution time of synchronous invocation chains (critical paths) in microservices benchmarks. This paper shows that asynchronous, very short but intense resource demands (called millibottlenecks) outside of critical paths can also cause significant latency long tail. Using a traffic analysis stream processing application benchmark, we evaluated the impact of asynchronous workload bursts generated by a multi-layer data structure called LSM-tree (log-structured merge-tree) for continuous checkpointing. Outside of the critical path, LSM-tree relies on maintenance operations (e.g., flushing/compaction during a checkpoint) to reorganize LSM-tree in memory and on disk to keep data access latency short. Although asynchronous, such recurrent maintenance operations can cause frequent millibottlenecks, particularly when they overlap, a problem we call ShadowSync. For scheduling and statistical reasons, significant latency long tail can arise from ShadowSync caused by asynchronous recurrent operations. Our experimental results show that with typical settings of benchmark components such as RocksDB, ShadowSync can prolong request message latency by up to 2 seconds. We show effective mitigation methods can alleviate both scheduled and statistical ShadowSync reducing the latency long tail to less than 20% of the original at the 99.9th percentile. 
    more » « less
  2. Distributed key-value stores today require frequent key-value shard migration between nodes to react to dynamic workload changes for load balancing, data locality, and service elasticity. In this paper, we propose NetMigrate, a live migration approach for in-memory key-value stores based on programmable network data planes. NetMigrate migrates shards between nodes with zero service interruption and minimal performance impact. During migration, the switch data plane monitors the migration process in a fine-grained manner and directs client queries to the right server in real time, eliminating the overhead of pulling data between nodes. We implement a NetMigrate prototype on a testbed consisting of a programmable switch and several commodity servers running Redis and evaluate it under YCSB workloads. Our experiments demonstrate that NetMigrate improves the query throughput from 6.5% to 416% and maintains low access latency during migration, compared to the state-of-the-art migration approaches. 
    more » « less
  3. Nowadays erasure coding is one of the most significant techniques in cloud storage systems, which provides both quick parallel I/O processing and high capabilities of fault tolerance on massive data accesses. In these systems, triple disk failure tolerant arrays (3DFTs) is a typical configuration, which is supported by several classic erasure codes like Reed-Solomon (RS) codes, Local Reconstruction Codes (LRC), Minimum Storage Regeneration (MSR) codes, etc. For an online recovery process, the foreground application workloads and the background recovery workloads are handled simultaneously, which requires a comprehensive understanding on both two types of workload characteristics. Although several techniques have been proposed to accelerate the I/O requests of online recovery processes, they are typically unilateral due to the fact that the above two workloads are not combined together to achieve high cost-effective performance.To address this problem, we propose Erasure Codes Fusion (EC-Fusion), an efficient hybrid erasure coding framework in cloud storage systems. EC-Fusion is a combination of RS and MSR codes, which dynamically selects the appropriate code based on its properties. On one hand, for write-intensive application workloads or low risk on data loss in recovery workloads, EC-Fusion uses RS code to decrease the computational overhead and storage cost concurrently. On the other hand, for read-intensive or frequent reconstruction in workloads, MSR code is a proper choice. Therefore, a better overall application and recovery performance can be achieved in a cost-effective fashion. To demonstrate the effectiveness of EC-Fusion, several experiments are conducted in hadoop systems. The results show that, compared with the traditional hybrid erasure coding techniques, EC-Fusion accelerates the response time for application by up to 1.77×, and reduces the reconstruction time by up to 69.10%. 
    more » « less
  4. High elevation mountain watersheds are undergoing rapid warming and declining snow fractions worldwide, causing earlier and quicker snowmelt. Understanding how this hydrologic shift affects subsurface flow paths, biogeochemical reactions, and solute export has been challenging due to the entanglement of hydrological and biogeochemical processes. Coal Creek, a high-elevation catchment (2,700 3,700 m, 53 km2) in Colorado, is experiencing a higher rate of warming than surrounding low-lying areas. This warming corresponds with dynamic and increased responses from biogenic solutes and dissolved organic carbon (DOC), whereas the behavior of geogenic solutes and dissolved inorganic carbon (DIC) has remained relatively unchanged. DOC has experienced the largest concentration increase (>3x), with annual average flow weighted concentrations positively correlated to average annual temperature. This suggests temperature is the main driver of increasing DOC levels. Although DOC and DIC response to warming is influenced by many drivers, the relative contribution of each remains unknown. DOC and DIC were analyzed to incorporate both carbon component products of soil respiration (DOC and CO2) and to represent high solute concentrations transported by shallow (DOC) versus deep (DIC) subsurface flow. The contrasting behavior of these carbon solutes indicates climate change and warming are driving changes in organic matter decomposition and soil respiration. Modeling results from the process-based model HBV-BioRT show increased temperatures cause earlier snowmelt and streamflow generation and lower peak discharge. As stream flow generation occurs earlier, so do DOC flushing and DIC dilution events. Additionally, post-snowmelt periods show greater DOC production and concentrations under warming scenarios. Results indicated increased production of DOC in post-snowmelt periods. DOC is then flushed out by earlier snowmelt partitioned through the shallow soil zone. Most process-based studies lack a watershed-scale understanding of carbon transformation and flow path alterations. This work demonstrates complex hydrologic and biogeochemical coupling at the watershed scale to illustrate how water flow paths and chemistry are responding to a changing climate in highelevation mountain watersheds. 
    more » « less
  5. Objective This study develops a computational model to predict drivers’ response time and understand the underlying cognitive mechanism for freeway exiting takeovers in conditionally automated vehicles (AVs). Background Previous research has modeled drivers’ takeover response time in emergency scenarios that demand a quick response. However, existing models may not be applicable for scheduled, non-time-critical takeovers as drivers take longer to resume control when there is no time pressure. A model of driver response time in non-time-critical takeovers is lacking. Method A computational cognitive model of driver takeover response time is developed based on Queuing Network-Model Human Processor (QN-MHP) architecture. The model quantifies gaze redirection in response to takeover request (ToR), task prioritization, driver situation awareness, and driver trust to address the complexities of drivers' takeover strategies when sufficient time budget exists. Results Experimental data of a preliminary driving simulator study were used to validate the model. The model accounted for 97% of the experimental takeover response time for freeway exiting. Conclusion The current model can successfully predict drivers’ response time for scheduled, non-time-critical freeway exiting takeovers in conditionally AVs. Application This model can be applied to the human-machine interface design with respect to ToR lead time for enhancing safe freeway exiting takeovers in conditionally AVs. It also provides a foundation for future modeling work towards an integrated driver model of freeway exiting takeover performance. 
    more » « less