Side-channel attacks leverage implementation of algorithms to bypass security and leak restricted data. A timing attack observes differences in runtime in response to varying inputs to learn restricted information. Most prior work has focused on applying timing attacks to cryptoanalysis algorithms; other approaches sought to learn about database content by measuring the time of an operation (e.g., index update or query caching). Our goal is to evaluate the practical risks of leveraging a non-privileged user account to learn about data hidden from the user account by access control. As with other side-channel attacks, this attack exploits the inherent nature of how queries are executed in a database system. Internally, the database engine processes the entire database table, even if the user only has access to some of the rows. We present a preliminary investigation of what a regular user can learn about “hidden” data by observing the execution time of their queries over an indexed column in a table. We perform our experiments in a cache-control environment (i.e., clearing database cache between runs) to measure an upper bound for data leakage and privacy risks. Our experiments show that, in a real system, it is difficult to reliably learn about restricted data due to natural operating system (OS) runtime fluctuations and OS-level caching. However, when the access control mechanism itself is relatively costly, a user can not only learn about hidden data but they may closely approximate the number of rows hidden by the access control mechanism. 
                        more » 
                        « less   
                    
                            
                            A Community Cache with Complete Information
                        
                    
    
            Kariz is a new architecture for caching data from datalakes accessed, potentially concurrently, by multiple analytic platforms. It integrates rich information from analytics platforms with global knowledge about demand and resource availability to enable sophisticated cache management and prefetching strategies that, for example, combine historical run time information with job dependency graphs (DAGs), information about the cache state and sharing across compute clusters. Our prototype supports multiple analytic frameworks (Pig/Hadoop and Spark), and we show that the required changes are modest. We have implemented three algorithms in Kariz for optimizing the caching of individual queries (one from the literature, and two novel to our platform) and three policies for optimizing across queries from, potentially, multiple different clusters. With an algorithm that fully exploits the rich information available from Kariz, we demonstrate major speedups (as much as 3×) for TPC-H and TPC-DS. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1910327
- PAR ID:
- 10313697
- Date Published:
- Journal Name:
- 19th USENIX Conference on File and Storage Technologies (FAST 21)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Data movement is a common performance bottleneck, and its chief remedy is caching. Traditional cache management is transparent to the workload: data that should be kept in cache are determined by the recency information only, while the program information, i.e., future data reuses, is not communicated to the cache. This has changed in a new cache design named Lease Cache . The program control is passed to the lease cache by a compiler technique called Compiler Assigned Reference Lease (CARL). This technique collects the reuse interval distribution for each reference and uses it to compute and assign the lease value to each reference. In this article, we prove that CARL is optimal under certain statistical assumptions. Based on this optimality, we prove miss curve convexity, which is useful for optimizing shared cache, and sub-partitioning monotonicity, which simplifies lease compilation. We evaluate the potential using scientific kernels from PolyBench and show that compiler insertions of up to 34 leases in program code achieve similar or better cache utilization (in variable size cache) than the optimal fixed-size caching policy, which has been unattainable with automatic caching but now within the potential of cache programming for all tested programs and most cache sizes.more » « less
- 
            Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and unbalanced cache issues. This oversight can result in system crashes and degraded user experience. To bridge this gap, we introduce a novel digital twin-assisted optimization framework, called D-REC, which integrates reinforcement learning (RL) with diverse intervention modules to ensure reliable caching in nextG wireless networks. We first develop a joint vertical and horizontal twinning approach to efficiently create network digital twins, which are then employed by D-REC as RL optimizers and safeguards, providing ample datasets for training and predictive evaluation of our cache replacement policy. By incorporating reliability modules into a constrained Markov decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints, minimizing the risk of network failures. Theoretical analysis demonstrates comparable convergence rates between DREC and vanilla data-driven methods without compromising caching performance. Extensive experiments validate that D-REC outperforms conventional approaches in cache hit rate and load balancing while effectively enforcing predetermined reliability intervention modules.more » « less
- 
            Cache systems are widely used to speed up data retrieving. Modern HPC, data analytics, and AI/ML workloads generate vast, multi-dimensional datasets, and those data are accessed via complex queries. However, the probability of requesting the exact same data across different queries is low, leading to limited performance improvement when a traditional key-value cache is applied. In this paper, we present Mosaic-Cache, a proactive and general caching framework that enables applications with efficient partial overlapped data reuse through novel overlap-aware cache interfaces for fast content-level reuse. The core components include a metadata manager leveraging customizable indexing for fast overlap lookups, an adaptive fetch planner for dynamic cache-to-storage decisions, and an async merger to reduce cache fragmentation and redundancy. Evaluations on real-world HPC datasets show that Mosaic-Cache improves overall performance by up to 4.1× over traditional key-value-based cache while adding minimal overhead in worst-case scenarios.more » « less
- 
            Content caching is vital for enhancing web server efficiency and reducing network congestion, particularly in platforms predicting user actions Despite many studies conducted to improve cache replacement strategies , there remains space for improvement. This paper introduces STRCacheML, a Machine Learning (ML) assisted Content Caching Policy. STRCacheML leverages available attributes within a platform to make intelligent cache replacement decisions offline. We have t ested various Machine Learning and Deep Learning algorithms to adapt the one with the highest accuracy; we have integrated that algorithm into our cache replacement policy. This selected ML algorithm was employed to estimate the likelihood of cache objects being requested again, an essential factor in cache eviction scenarios. The IMDb dataset, constituting numerous videos with corresponding attributes, was utilized to conduct our experiment. The experimental section highlights our model’s efficacy, present ing comparative results compared to the established approaches based on raw cache hits and cache hit rates.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    