NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhanced File System Testing through Input and Output Coverage

https://doi.org/10.1145/3757347.3759138

Liu, Yifei; Kuenning, Geoff; Parvez, Md Kamal; Smolka, Scott A; Zadok, Erez (September 2025, ACM)

Effective file system testing relies on coverage to detect bugs and enhance reliability. We analyzed real file system bugs and found a weak correlation between code coverage, the most commonly used metric, and test effectiveness; many bugs were in covered code but remained undetected. Our study also showed that covering diverse file system inputs and outputs—system call arguments and return values—can be key to detecting the majority of observed bugs. We present input coverage and output coverage as new metrics for evaluating and improving file system testing, and have developed the IOCov framework for computing these metrics. Unlike existing system call tracers, IOCov computes coverage using only the calls relevant to testing, excluding unrelated ones that should not be counted. To demonstrate IOCov’s utility, we used it to extend the existing testing tool CrashMonkey into CM-IOCov, which achieves broader input coverage and more thorough detection of crash consistency bugs. Our experimental evaluation shows that IOCov com- putes input and output coverage accurately with minimal overhead. IOCov is applicable to different types of file system testing and can provide insights for improvement as well as identify untested cases based on coverage results. Moreover, the bugs found exclusively by CM-IOCov are 2.1 and 12.9 times more than those found exclusively by CrashMonkey on the 6.12 and 5.6 kernels, respectively, demonstrating the effectiveness of the IOCov-based coverage approach.
more » « less
Free, publicly-accessible full text available September 8, 2026
Secure Archival is Hard... Really Hard

https://doi.org/10.1145/3655038.3666093

Smith, Christopher; Tabassum, Maliha; Daruru, Soumya Chowdary; Kulhare, Gaurav; Wang, Arvin; Miller, Ethan L; Zadok, Erez (July 2024, ACM)

Archival systems are often tasked with storing highly valuable data that may be targeted by malicious actors. When the lifetime of the secret data is on the order of decades to centuries, the threat of improved cryptanalysis casts doubt on the long-term security of cryptographic techniques, which rely on hardness assumptions that are hard to prove over archival time scales. This threat makes the design of secure archival systems exceptionally difficult. Some archival systems turn a blind eye to this issue, hoping that current cryptographic techniques will not be broken; others often use techniques--—such as secret sharing—that are impractical at scale. This position paper sheds light on the core challenges behind building practically viable secure long-term archives; we identify promising research avenues towards this goal.
more » « less
Full Text Available
Balancing Costs and Durability for Serverless Data

Merenstein, Alex; Wang, Xinran; Tarasov, Vasily; Agarwal, Prajjawal; Guthridge, Scott; Thakkar, Kapil; Wu, Katherine; Anwar, Ali; Zadok, Erez (June 2024, IEEE)

Durability features such as replication or erasure coding serve an important role in storage systems, enabling users to store data without fear of loss due to device failures. However, these durability features come with a cost, in terms of storage, network traffic, and computational overheads. For most data, loss is a catastrophic event and so these overheads are acceptable. However, some data tolerates low durability and does not need the high level of durability that most storage systems provide. Identifying the proper level of durability for a piece of data is difficult, especially since it is often not clear how to determine the cost of loss. For some data used in serverless applications, however, this cost is relatively straightforward to calculate: serverless functions are often required to be idempotent, meaning that the data produced by them can be re-created by re-running the function. The cost of losing a piece of data then is merely the cost of re-running the function that originally created the data. In this paper, we explore the tradeoff between the cost of storing data durably and the cost to re-create data. We focus on serverless data because its ability to be recreated makes it possible to assign a cost to its loss. We develop a mathematical model that relates compute costs, storage costs, and application-specific parameters to calculate the cost-optimal placement of data. We also develop an execution framework capable of handling lost data transparently, enabling applications to use lower-durability storage with no additional burden on the developer. Next, we show how different factors such as failure rate and compute costs affect the placement decision. We find that thanks to the relatively short lifetime of serverless data, the probability of data loss even on low-durability storage is fairly low. Finally, we use the model to place data for several applications, including a video-transcoding application and an image-assembly application. We show that our model can predict execution costs within 7% of actual execution costs, and can reduce storage costs by up to 3x while never exceeding baseline costs.
more » « less
Full Text Available
Verifiable Sustainability in Data Centers

https://doi.org/10.1109/MSEC.2024.3372488

Hussain, Syed Rafiul; McDaniel, Patrick; Gandhi, Anshul; Ghose, Kanad; Gopalan, Kartik; Lee, Dongyoon; Liu, Yu David; Liu, Zhenhua; Mu, Shuai; Zadok, Erez (January 2024, IEEE Security & Privacy)

The current techniques and tools for collecting, aggregating, and reporting verifiable sustainability data are vulnerable to cyberattacks and misuse, requiring new security and privacy-preserving solutions. This article outlines security challenges and research directions for addressing these requirements.
more » « less
Full Text Available
A New Approach to Post-Quantum Non-Malleability

https://doi.org/10.1109/FOCS57990.2023.00041

Liang, Xiao; Pandey, Omkant; Yamakawa, Takashi (November 2023, Proceedings of Annual IEEE Symposium on Foundations of Computer Science 2023)
Input and Output Coverage Needed in File System Testing

https://doi.org/10.1145/3599691.3603405

Liu, Yifei; Ahuja, Gautam; Kuenning, Geoff; Smolka, Scott; Zadok, Erez (July 2023, The 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage '23))

File systems need testing to discover bugs and to help ensure reliability. Many file system testing tools are evaluated based on their code coverage. We analyzed recently reported bugs in Ext4 and BtrFS and found a weak correlation between code coverage and test effectiveness: many bugs are missed because they depend on specific inputs, even though the code was covered by a test suite. Our position is that coverage of system call inputs and outputs is critically important for testing file systems. We thus suggest input and output coverage as criteria for file system testing, and show how they can improve the effectiveness of testing. We built a prototype called IOcov to evaluate the input and output coverage of file system testing tools. IOcov identified many untested cases (specific inputs and outputs or ranges thereof) for both CrashMonkey and xfstests. Additionally, we discuss a method and associated metrics to identify over- and under-testing using IOcov.
more » « less
Full Text Available
F3: Serving Files Efficiently in Serverless Computing

https://doi.org/10.1145/3579370.3594771

Merenstein, Alex; Tarasov, Vasily; Anwar, Ali; Guthridge, Scott; Zadok, Erez (June 2023, The 16th ACM International Systems and Storage Conference (SYSTOR '23))

Serverless platforms offer on-demand computation and represent a significant shift from previous platforms that typically required resources to be pre-allocated (e.g., virtual machines). As serverless platforms have evolved, they have become suitable for a much wider range of applications than their original use cases. However, storage access remains a pain point that holds serverless back from becoming a completely generic computation platform. Existing storage for serverless typically uses an object interface. Although object APIs are simple to use, they lack the richness, versatility, and performance of file based APIs. Additionally, there is a large body of existing applications that relies on file-based interfaces. The lack of file based storage options prevents these applications from being ported to serverless environments. In this paper, we present F3, a file system that offers features to improve file access in serverless platforms: (1) efficient handling of ephemeral data, by placing ephemeral and non-ephemeral data on storage that exists at a different points along the durability-performance tradeoff continuum, (2) locality-aware data scheduling, and (3) efficient reading while writing. We modified OpenWhisk to support attaching file-based storage and to leverage F3's features using hints. Our prototype evaluation of F3 shows improved performance of up to 1.5--6.5x compared to existing storage systems.
more » « less
Full Text Available
Lethe: Secure Deletion by Addition

https://doi.org/10.1145/3578353.3589541

Chou, Eugene; Conrad-Shah, Leo; Barker, Austen; Quinn, Andrew; Miller, Ethan L.; Long, Darrell D. (May 2023, 3rd Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems (CHEOPS ’23))

Modern data privacy regulations such as GDPR, CCPA, and CDPA stipulate that data pertaining to a user must be deleted without undue delay upon the user’s request. Existing systems are not designed to comply with these regulations and can leave traces of deleted data for indeterminate periods of time, often as long as months. We developed Lethe to address these problems by providing fine-grained secure deletion on any system and any storage medium, provided that Lethe has access to a fixed, small amount of securely-deletable storage. Lethe achieves this using keyed hash forests (KHFs), extensions of keyed hash trees (KHTs), structured to serve as efficient representations of encryption key hierarchies. By using a KHF as a regulator for data access, Lethe provides its secure deletion not by removing the KHF, but by adding a new KHF that only grants access to still-valid data. Access to the previous KHF is lost, and the data it regulated securely deleted, through the secure deletion of the single key that protected the previous KHF.
more » « less
Full Text Available
Credibility in Private Set Membership

https://doi.org/10.1007/978-3-031-31371-4_6

Garg, S.; Hajiabadi, M.; Jain, A.; Jin, Z.; Pandey, O.; Shiehian, S. (May 2023, Public-Key Cryptography (PKC 2023))
Boldyreva, A.; Kolesnikov, V. (Ed.)
A private set membership (PSM) protocol allows a “receiver” to learn whether its input x is contained in a large database 𝖣𝖡 held by a “sender”. In this work, we define and construct credible private set membership (C-PSM) protocols: in addition to the conventional notions of privacy, C-PSM provides a soundness guarantee that it is hard for a sender (that does not know x) to convince the receiver that 𝑥∈𝖣𝖡. Furthermore, the communication complexity must be logarithmic in the size of 𝖣𝖡. We provide 2-round (i.e., round-optimal) C-PSM constructions based on standard assumptions: We present a black-box construction in the plain model based on DDH or LWE. Next, we consider protocols that support predicates f beyond string equality, i.e., the receiver can learn if there exists 𝑤∈𝖣𝖡 such that 𝑓(𝑥,𝑤)=1. We present two results with transparent setups: (1) A black-box protocol, based on DDH or LWE, for the class of NC1 functions f which are efficiently searchable. (2) An LWE-based construction for all bounded-depth circuits. The only non-black-box use of cryptography in this construction is through the bootstrapping procedure in fully homomorphic encryption. As an application, our protocols can be used to build enhanced round-optimal leaked password notification services, where unlike existing solutions, a dubious sender cannot fool a receiver into changing its password. https://doi.org/10.1007/978-3-031-31371-4_6
more » « less
Full Text Available
Improving Storage Systems Using Machine Learning

https://doi.org/10.1145/3568429

Akgun, Ibrahim Umit; Aydin, Ali Selman; Burford, Andrew; McNeill, Michael; Arkhangelskiy, Michael; Zadok, Erez (February 2023, ACM Transactions on Storage)

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices.
more » « less
Full Text Available

« Prev Next »

Search for: All records