Search for: All records

Award ID contains: 1900706

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Biosignal Authentication Considered Harmful Today

Krish, V; Paoletti, N; Kazemi, M; Smolka, S A; Rahmati, A (August 2024, USENIX)

User authentication systems based on cardiovascular biosignals have gained prominence in recent years, as these signals are presumed to be difficult to forge. We challenge this assumption by showing that an observer who has access to one type of cardiac data --- such as a user's pulse waveform, readily obtainable from video and commercial smartwatches --- can design a spoofing attack strong enough to fool authentication systems based on other cardiovascular biosignals. We present BioForge, an approach that leverages a cycle-consistent generative adversarial network to synthesize realistic physiological signals for a given user without relying on simultaneously collected supervision data. We evaluate BioForge on multiple open-access datasets and an array of verification systems, many of which can be fooled over 50% of the time in 10 or fewer attempts. Notably, we are able to fool systems that rely not just on heart rate and peak locations but also on the morphology of the waveforms. We additionally showcase how BioForge can be used to spoof authentication systems from biosignal data extracted from video clips of a target user. Our work demonstrates that authentication systems should not rely on the secrecy of cardiovascular biosignals.
more » « less
Full Text Available
Balancing Costs and Durability for Serverless Data

Merenstein, Alex; Wang, Xinran; Tarasov, Vasily; Agarwal, Prajjawal; Guthridge, Scott; Thakkar, Kapil; Wu, Katherine; Anwar, Ali; Zadok, Erez (June 2024, IEEE)

Durability features such as replication or erasure coding serve an important role in storage systems, enabling users to store data without fear of loss due to device failures. However, these durability features come with a cost, in terms of storage, network traffic, and computational overheads. For most data, loss is a catastrophic event and so these overheads are acceptable. However, some data tolerates low durability and does not need the high level of durability that most storage systems provide. Identifying the proper level of durability for a piece of data is difficult, especially since it is often not clear how to determine the cost of loss. For some data used in serverless applications, however, this cost is relatively straightforward to calculate: serverless functions are often required to be idempotent, meaning that the data produced by them can be re-created by re-running the function. The cost of losing a piece of data then is merely the cost of re-running the function that originally created the data. In this paper, we explore the tradeoff between the cost of storing data durably and the cost to re-create data. We focus on serverless data because its ability to be recreated makes it possible to assign a cost to its loss. We develop a mathematical model that relates compute costs, storage costs, and application-specific parameters to calculate the cost-optimal placement of data. We also develop an execution framework capable of handling lost data transparently, enabling applications to use lower-durability storage with no additional burden on the developer. Next, we show how different factors such as failure rate and compute costs affect the placement decision. We find that thanks to the relatively short lifetime of serverless data, the probability of data loss even on low-durability storage is fairly low. Finally, we use the model to place data for several applications, including a video-transcoding application and an image-assembly application. We show that our model can predict execution costs within 7% of actual execution costs, and can reduce storage costs by up to 3x while never exceeding baseline costs.
more » « less
Full Text Available
Accelerating multi-tier storage cache simulations using knee detection

https://doi.org/10.1016/j.peva.2024.102410

Estro, Tyler; Antunes, Mário; Bhandari, Pranav; Gandhi, Anshul; Kuenning, Geoff; Liu, Yifei; Waldspurger, Carl; Wildani, Avani; Zadok, Erez (May 2024, Performance Evaluation)

Storage cache hierarchies include diverse topologies, assorted parameters and policies, and devices with varied performance characteristics. Simulation enables efficient exploration of their configuration space while avoiding expensive physical experiments. Miss Ratio Curves (MRCs) efficiently characterize the performance of a cache over a range of cache sizes, revealing ‘‘key points’’ for cache simulation, such as knees in the curve that immediately follow sharp cliffs. Unfortunately, there are no automated techniques for efficiently finding key points in MRCs, and the cross-application of existing knee-detection algorithms yields inaccurate results. We present a multi-stage framework that identifies key points in any MRC, for both stack- based (e.g., LRU) and more sophisticated eviction algorithms (e.g., ARC). Our approach quickly locates candidates using efficient hash-based sampling, curve simplification, knee detection, and novel post-processing filters. We introduce Z-Method, a new multi-knee detection algorithm that employs statistical outlier detection to choose promising points robustly and efficiently. We evaluated our framework against seven other knee-detection algorithms, identifying key points in multi-tier MRCs with both ARC and LRU policies for 106 diverse real-world workloads. Compared to naïve approaches, our framework reduced the total number of points needed to accurately identify the best two-tier cache hierarchies by an average factor of approximately 5.5x for ARC and 7.7x for LRU. We also show how our framework can be used to seed the initial population for evolutionary algorithms. We ran 32,616 experiments requiring over three million cache simulations, on 151 samples, from three datasets, using a diverse set of population initialization techniques, evolutionary algorithms, knee-detection algorithms, cache replacement algorithms, and stopping criteria. Our results showed an overall acceleration rate of 34% across all configurations.
more » « less
Full Text Available
Persistent Memory Research in the Post-Optane Era

https://doi.org/10.1145/3609308.3625268

Desnoyers, Peter; Adams, Ian; Estro, Tyler; Gandhi, Anshul; Kuenning, Geoff; Mesnier, Mike; Waldspurger, Carl; Wildani, Avani; Zadok, Erez (October 2023, ACM)

After over a decade of researcher anticipation for the arrival of persistent memory (PMem), the first shipments of 3D XPoint-based Intel Optane Memory in 2019 were quickly followed by its cancellation in 2022. Was this another case of an idea quickly fading from future to past tense, relegating work in this area to the graveyard of failed technologies? The recently introduced Compute Express Link (CXL) may offer a path forward, with its persistent memory profile offering a universal PMem attachment point. Yet new technologies for memory-speed persistence seem years off, and may never become competitive with evolving DRAM and flash speeds. Without persistent memory itself, is future PMem research doomed? We offer two arguments for why reports of the death of PMem research are greatly exaggerated. First, the bulk of persistent-memory research has not in fact addressed memory persistence, but rather in-memory crash consistency, which was never an issue in prior systems where CPUs could not observe post-crash memory states. CXL memory pooling allows multiple hosts to share a single memory, all in different failure domains, raising crash-consistency issues even with volatile memory. Second, we believe CXL necessitates a ``disaggregation'' of PMem research. Most work to date assumed a single technology and set of features, \ie speed, byte addressability, and CPU load/store access. With an open interface allowing new topologies and diverse PMem technologies, we argue for the need to examine these features individually and in combination. While one form of PMem may have been canceled, we argue that the research problems it raised not only remain relevant but have expanded in a CXL-based future.
more » « less
Full Text Available
Guiding Simulations of Multi-Tier Storage Caches Using Knee Detection

https://doi.org/10.1109/MASCOTS59514.2023.10387545

Estro, Tyler; Antunes, Mário; Bhandari, Pranav; Gandhi, Anshul; Kuenning, Geoff; Liu, Yifei; Waldspurger, Carl; Wildani, Avani; Zadok, Erez (October 2023, IEEE)

Simulating storage cache hierarchies enables effi- cient exploration of their configuration space, including diverse topologies, parameters and policies, and devices with varied performance characteristics, while avoiding expensive physical experiments. Miss Ratio Curves (MRCs) efficiently characterize the performance of a cache over a range of cache sizes. These useful tools reveal “key points” for cache simulation, such as knees in the curve that immediately follow sharp cliffs. Unfortunately, there are no automated techniques for efficiently finding key points in MRCs, and the cross-application of existing knee-detection algorithms yields inaccurate results. We present a multi-stage framework that identifies key points in any MRC, for both stack-based (e.g., LRU) and more sophisticated eviction algorithms (e.g., ARC). Our approach quickly locates candidates using efficient hash-based sampling, curve simplification, knee detection, and novel post-processing filters. We introduce Z-Method, a new multi-knee detection algorithm that employs statistical outlier detection to choose promising points robustly and efficiently. We evaluate our framework against seven other knee-detection algorithms, using both ARC and LRU MRCs from 106 diverse real-world workloads, and apply it to identify key points in multi-tier MRCs. Compared to naïve approaches, our framework reduces the total number of points needed to accurately identify the best two-tier cache hierarchies by an average factor of approximately 5.5x for ARC and 7.7x for LRU.
more » « less
Full Text Available
Input and Output Coverage Needed in File System Testing

https://doi.org/10.1145/3599691.3603405

Liu, Yifei; Ahuja, Gautam; Kuenning, Geoff; Smolka, Scott; Zadok, Erez (July 2023, The 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage '23))

File systems need testing to discover bugs and to help ensure reliability. Many file system testing tools are evaluated based on their code coverage. We analyzed recently reported bugs in Ext4 and BtrFS and found a weak correlation between code coverage and test effectiveness: many bugs are missed because they depend on specific inputs, even though the code was covered by a test suite. Our position is that coverage of system call inputs and outputs is critically important for testing file systems. We thus suggest input and output coverage as criteria for file system testing, and show how they can improve the effectiveness of testing. We built a prototype called IOcov to evaluate the input and output coverage of file system testing tools. IOcov identified many untested cases (specific inputs and outputs or ranges thereof) for both CrashMonkey and xfstests. Additionally, we discuss a method and associated metrics to identify over- and under-testing using IOcov.
more » « less
Full Text Available
F3: Serving Files Efficiently in Serverless Computing

https://doi.org/10.1145/3579370.3594771

Merenstein, Alex; Tarasov, Vasily; Anwar, Ali; Guthridge, Scott; Zadok, Erez (June 2023, The 16th ACM International Systems and Storage Conference (SYSTOR '23))

Serverless platforms offer on-demand computation and represent a significant shift from previous platforms that typically required resources to be pre-allocated (e.g., virtual machines). As serverless platforms have evolved, they have become suitable for a much wider range of applications than their original use cases. However, storage access remains a pain point that holds serverless back from becoming a completely generic computation platform. Existing storage for serverless typically uses an object interface. Although object APIs are simple to use, they lack the richness, versatility, and performance of file based APIs. Additionally, there is a large body of existing applications that relies on file-based interfaces. The lack of file based storage options prevents these applications from being ported to serverless environments. In this paper, we present F3, a file system that offers features to improve file access in serverless platforms: (1) efficient handling of ephemeral data, by placing ephemeral and non-ephemeral data on storage that exists at a different points along the durability-performance tradeoff continuum, (2) locality-aware data scheduling, and (3) efficient reading while writing. We modified OpenWhisk to support attaching file-based storage and to leverage F3's features using hints. Our prototype evaluation of F3 shows improved performance of up to 1.5--6.5x compared to existing storage systems.
more » « less
Full Text Available
Improving Storage Systems Using Machine Learning

https://doi.org/10.1145/3568429

Akgun, Ibrahim Umit; Aydin, Ali Selman; Burford, Andrew; McNeill, Michael; Arkhangelskiy, Michael; Zadok, Erez (February 2023, ACM Transactions on Storage)

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices.
more » « less
Full Text Available
NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis

https://doi.org/10.1109/TVCG.2022.3209361

Tyagi, Anjul; Xie, Cong; Mueller, Klaus (January 2023, IEEE Transactions on Visualization and Computer Graphics)

The success of DL can be attributed to hours of parameter and architecture tuning by human experts. Neural Architecture Search (NAS) techniques aim to solve this problem by automating the search procedure for DNN architectures making it possible for non-experts to work with DNNs. Specifically, One-shot NAS techniques have recently gained popularity as they are known to reduce the search time for NAS techniques. One-Shot NAS works by training a large template network through parameter sharing which includes all the candidate NNs. This is followed by applying a procedure to rank its components through evaluating the possible candidate architectures chosen randomly. However, as these search models become increasingly powerful and diverse, they become harder to understand. Consequently, even though the search results work well, it is hard to identify search biases and control the search progression, hence a need for explainability and human-in-the-loop (HIL) One-Shot NAS. To alleviate these problems, we present NAS-Navigator, a visual analytics (VA) system aiming to solve three problems with One-Shot NAS; explainability, HIL design, and performance improvements compared to existing state-of-the-art (SOTA) techniques. NAS-Navigator gives full control of NAS back in the hands of the users while still keeping the perks of automated search, thus assisting non-expert users. Analysts can use their domain knowledge aided by cues from the interface to guide the search. Evaluation results confirm the performance of our improved One-Shot NAS algorithm is comparable to other SOTA techniques. While adding Visual Analytics (VA) using NAS-Navigator shows further improvements in search time and performance. We designed our interface in collaboration with several deep learning researchers and evaluated NAS-Navigator through a control experiment and expert interviews.
more » « less
Full Text Available
Predicting Network Buffer Capacity for BBR Fairness

Akgun, Ibrahim "umit"; Vargas, Santiago; Arkhangelskiy, Michael; Burford, Andrew; McNeill, Michael; Balasubramanian, Aruna; Gandhi, Anshul; Zadok, Erez (December 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on ML for Systems)

BBR is a newer TCP congestion control algorithm with promising features, but it can often be unfair to existing loss-based congestion-control algorithms. This is because BBR's sending rate is dictated by static parameters that do not adapt well to dynamic and diverse network conditions. In this work, we introduce BBR-ML, an in-kernel ML-based tuning system for BBR, designed to improve fairness when in competition with loss-based congestion control. To build BBR-ML, we discretized the network condition search space and trained a model on 2,500 different network conditions. We then modified BBR to run an in-kernel model to predict network buffer sizes, and then use this prediction for optimal parameter settings. Our preliminary evaluation results show that BBR-ML can improve fairness when in competition with Cubic by up to 30% in some cases.
more » « less
Full Text Available

« Prev Next »