skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2106263

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Archival systems are often tasked with storing highly valuable data that may be targeted by malicious actors. When the lifetime of the secret data is on the order of decades to centuries, the threat of improved cryptanalysis casts doubt on the long-term security of cryptographic techniques, which rely on hardness assumptions that are hard to prove over archival time scales. This threat makes the design of secure archival systems exceptionally difficult. Some archival systems turn a blind eye to this issue, hoping that current cryptographic techniques will not be broken; others often use techniques--—such as secret sharing—that are impractical at scale. This position paper sheds light on the core challenges behind building practically viable secure long-term archives; we identify promising research avenues towards this goal. 
    more » « less
    Free, publicly-accessible full text available July 8, 2025
  2. Durability features such as replication or erasure coding serve an important role in storage systems, enabling users to store data without fear of loss due to device failures. However, these durability features come with a cost, in terms of storage, network traffic, and computational overheads. For most data, loss is a catastrophic event and so these overheads are acceptable. However, some data tolerates low durability and does not need the high level of durability that most storage systems provide. Identifying the proper level of durability for a piece of data is difficult, especially since it is often not clear how to determine the cost of loss. For some data used in serverless applications, however, this cost is relatively straightforward to calculate: serverless functions are often required to be idempotent, meaning that the data produced by them can be re-created by re-running the function. The cost of losing a piece of data then is merely the cost of re-running the function that originally created the data. In this paper, we explore the tradeoff between the cost of storing data durably and the cost to re-create data. We focus on serverless data because its ability to be recreated makes it possible to assign a cost to its loss. We develop a mathematical model that relates compute costs, storage costs, and application-specific parameters to calculate the cost-optimal placement of data. We also develop an execution framework capable of handling lost data transparently, enabling applications to use lower-durability storage with no additional burden on the developer. Next, we show how different factors such as failure rate and compute costs affect the placement decision. We find that thanks to the relatively short lifetime of serverless data, the probability of data loss even on low-durability storage is fairly low. Finally, we use the model to place data for several applications, including a video-transcoding application and an image-assembly application. We show that our model can predict execution costs within 7% of actual execution costs, and can reduce storage costs by up to 3x while never exceeding baseline costs. 
    more » « less
    Free, publicly-accessible full text available June 6, 2025
  3. The current techniques and tools for collecting, aggregating, and reporting verifiable sustainability data are vulnerable to cyberattacks and misuse, requiring new security and privacy-preserving solutions. This article outlines security challenges and research directions for addressing these requirements. 
    more » « less
  4. File systems need testing to discover bugs and to help ensure reliability. Many file system testing tools are evaluated based on their code coverage. We analyzed recently reported bugs in Ext4 and BtrFS and found a weak correlation between code coverage and test effectiveness: many bugs are missed because they depend on specific inputs, even though the code was covered by a test suite. Our position is that coverage of system call inputs and outputs is critically important for testing file systems. We thus suggest input and output coverage as criteria for file system testing, and show how they can improve the effectiveness of testing. We built a prototype called IOcov to evaluate the input and output coverage of file system testing tools. IOcov identified many untested cases (specific inputs and outputs or ranges thereof) for both CrashMonkey and xfstests. Additionally, we discuss a method and associated metrics to identify over- and under-testing using IOcov. 
    more » « less
  5. Serverless platforms offer on-demand computation and represent a significant shift from previous platforms that typically required resources to be pre-allocated (e.g., virtual machines). As serverless platforms have evolved, they have become suitable for a much wider range of applications than their original use cases. However, storage access remains a pain point that holds serverless back from becoming a completely generic computation platform. Existing storage for serverless typically uses an object interface. Although object APIs are simple to use, they lack the richness, versatility, and performance of file based APIs. Additionally, there is a large body of existing applications that relies on file-based interfaces. The lack of file based storage options prevents these applications from being ported to serverless environments. In this paper, we present F3, a file system that offers features to improve file access in serverless platforms: (1) efficient handling of ephemeral data, by placing ephemeral and non-ephemeral data on storage that exists at a different points along the durability-performance tradeoff continuum, (2) locality-aware data scheduling, and (3) efficient reading while writing. We modified OpenWhisk to support attaching file-based storage and to leverage F3's features using hints. Our prototype evaluation of F3 shows improved performance of up to 1.5--6.5x compared to existing storage systems. 
    more » « less
  6. Modern data privacy regulations such as GDPR, CCPA, and CDPA stipulate that data pertaining to a user must be deleted without undue delay upon the user’s request. Existing systems are not designed to comply with these regulations and can leave traces of deleted data for indeterminate periods of time, often as long as months. We developed Lethe to address these problems by providing fine-grained secure deletion on any system and any storage medium, provided that Lethe has access to a fixed, small amount of securely-deletable storage. Lethe achieves this using keyed hash forests (KHFs), extensions of keyed hash trees (KHTs), structured to serve as efficient representations of encryption key hierarchies. By using a KHF as a regulator for data access, Lethe provides its secure deletion not by removing the KHF, but by adding a new KHF that only grants access to still-valid data. Access to the previous KHF is lost, and the data it regulated securely deleted, through the secure deletion of the single key that protected the previous KHF. 
    more » « less
  7. Boldyreva, A. ; Kolesnikov, V. (Ed.)
    A private set membership (PSM) protocol allows a “receiver” to learn whether its input x is contained in a large database 𝖣𝖡 held by a “sender”. In this work, we define and construct credible private set membership (C-PSM) protocols: in addition to the conventional notions of privacy, C-PSM provides a soundness guarantee that it is hard for a sender (that does not know x) to convince the receiver that 𝑥∈𝖣𝖡. Furthermore, the communication complexity must be logarithmic in the size of 𝖣𝖡. We provide 2-round (i.e., round-optimal) C-PSM constructions based on standard assumptions: We present a black-box construction in the plain model based on DDH or LWE. Next, we consider protocols that support predicates f beyond string equality, i.e., the receiver can learn if there exists 𝑤∈𝖣𝖡 such that 𝑓(𝑥,𝑤)=1. We present two results with transparent setups: (1) A black-box protocol, based on DDH or LWE, for the class of NC1 functions f which are efficiently searchable. (2) An LWE-based construction for all bounded-depth circuits. The only non-black-box use of cryptography in this construction is through the bootstrapping procedure in fully homomorphic encryption. As an application, our protocols can be used to build enhanced round-optimal leaked password notification services, where unlike existing solutions, a dubious sender cannot fool a receiver into changing its password. https://doi.org/10.1007/978-3-031-31371-4_6 
    more » « less
  8. Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices. 
    more » « less