We present FusionFS, a direct-access firmware-level in-storage filesystem that exploits the near-storage computational capability for fast I/O and data processing, consequently reducing I/O bottlenecks. In FusionFS, we introduce a new abstraction, CISCOps, that combines multiple I/O and data processing operations into one fused operation and offloaded for near-storage processing. By offloading, CISCOps significantly reduces dominant I/O overheads such as system calls, data movement, communication, and other software overheads. Further, to enhance the use of CISCOps, we introduce MicroTx for fine-grained crash consistency and fast (automatic) recovery of I/O and data processing operations. We also explore scheduling techniques to ensure fair and efficient use of in-storage compute and memory resources across tenants. Evaluation of FusionFS against the state-of-the-art user-level, kernel-level, and firmware-level file systems using microbenchmarks, macrobenchmarks, and real-world applications shows up to 6.12X, 5.09X and 2.07X performance gains, and 2.65X faster recovery for applications.
more »
« less
Facebook’s Tectonic Filesystem: Efficiency from Exascale
Tectonic is Facebook’s exabyte-scale distributed filesystem. Tectonic consolidates large tenants that previously used service-specific systems into general multitenant filesystem instances that achieve performance comparable to the specialized systems. The exabyte-scale consolidated instances enable better resource utilization, simpler services, and less operational complexity than our previous approach. This paper describes Tectonic’s design, explaining how it achieves scalability, supports multitenancy, and allows tenants to specialize operations to optimize for diverse workloads. The paper also presents insights from designing, deploying, and operating Tectonic.
more »
« less
- Award ID(s):
- 1910390
- PAR ID:
- 10293836
- Date Published:
- Journal Name:
- USENIX Conference on File and Storage Technologies
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A scalable storage system is an integral requirement for supporting large-scale cloud computing jobs. The raw space on storage systems is made usable with the help of a software layer which is typically called a filesystem (e.g., Google's Cloud Filestore). In this paper, we present the design and implementation of an open-source and free cloud-based filesystem named as "Greyfish" that can be installed on the Virtual Machines (VMs) hosted on different cloud computing systems, such as Jetstream and Chameleon. Greyfish helps in: (1) storing files and directories for different user-accounts in a shared space on the cloud, (2) managing file-access permissions, and (3) purging files when needed. It is currently being used in the implementation of the Gateway-In-A-Box (GIB) project. A simplified version of Greyfish, known as Reef, is already in production in the BOINC@TACC project. Science gateway developers will find Greyfish useful for creating local filesystems that can be mounted in containers. By doing so, they can independently do quick installations of self-contained software solutions in development and test environments while mounting the filesystems on large-scale storage platforms in the production environments only.more » « less
-
null (Ed.)Parallel filesystems (PFSs) are one of the most critical high-availability components of High Performance Computing (HPC) systems. Most HPC workloads are dependent on the availability of a POSIX compliant parallel filesystem that provides a globally consistent view of data to all compute nodes of a HPC system. Because of this central role, failure or performance degradation events in the PFS can impact every user of a HPC resource. There is typically insufficient information available to users and even many HPC staff to identify the causes of these PFS events, impeding the implementation of timely and targeted remedies to PFS issues. The relevant information is distributed across PFS servers; however, access to these servers is highly restricted due to the sensitive role they play in the operations of a HPC system. Additionally, the information is challenging to aggregate and interpret, relegating diagnosis and treatment of PFS issues to a select few experts with privileged system access. To democratize this information, we are developing an open-source and user-facing Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that analyzes the requisite data to establish causal relationships between PFS activity and events detrimental to stability and performance. We are implementing the service for the open-source Lustre filesystem, which is the most commonly used PFS at large-scale HPC sites. Server loads for specific PFS I/O operations (IOPs) will be measured and aggregated by the service to automatically estimate an effective load generated by every client, job, and user. The infrastructure provides a realtime, user accessible text-based interface and a publicly accessible web interface displaying both real-time and historical data. To democratize this information, we are developing an open-source and user-facing Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that analyzes the requisite data to establish causal relationships between PFS activity and events detrimental to stability and performance. We are implementing the service for the open-source Lustre filesystem, which is the most commonly used PFS at large-scale HPC sites. Server loads for specific PFS I/O operations (IOPs) will be measured and aggregated by the service to automatically estimate an effective load generated by every client, job, and user. The infrastructure provides a realtime, user accessible text-based interface and a publicly accessible web interface displaying both real-time and historical data.more » « less
-
Inspired by earlier findings that undetected errors were increasing on the Internet, we built a measurement system to detect errors that the TCP checksum fails to catch. We created a client–server framework in which servers sent known files to clients, and the received data was compared to the originals to identify undetected network-introduced errors. The system was deployed on several public testbeds. Over nine months, we transferred 26 petabytes of data. Scaling the system to capture many errors proved difficult. This paper describes the deployment challenges and presents interim results showing that prior error reports may come from two different sources: errors that bypass TCP and file system failures. The results also suggest that the system must collect data at the exabyte scale rather than the petabyte scale expected by earlier studies.more » « less
-
null (Ed.)This study draws on 71 indepth, semistructured interviews with landlords and property managers in Philadelphia, Pennsylvania. We find that the perceived burdens associated with evictions often make evictions less desirable for small-scale landlords than finding ways to work with tenants to keep them in their homes, including developing payment plans to help tenants catch up on back rent, adjusting rental rates, accepting services in lieu of rent, and aiding in referrals to housing and social service programs. Some landlords employ a technique of paying tenants to vacate, a practice referred to as cash for keys, which is an informal, off-the-books eviction. Our findings suggest that off-the-books evictions are far more prevalent than has been measured in official eviction data; therefore, the prevalence of residential displacement is more severe than previously documented.more » « less
An official website of the United States government

