NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs

https://doi.org/10.1145/3528223.3530171

Fouladi, Sadjad; Shacklett, Brennan; Poms, Fait; Arora, Arjun; Ozdemir, Alex; Raghavan, Deepti; Hanrahan, Pat; Fatahalian, Kayvon; Winstein, Keith (July 2022, ACM Transactions on Graphics)

In this paper we explore the viability of path tracing massive scenes using a "supercomputer" constructed on-the-fly from thousands of small, serverless cloud computing nodes. We present R2E2 (Really Elastic Ray Engine) a scene decomposition-based parallel renderer that rapidly acquires thousands of cloud CPU cores, loads scene geometry from a pre-built scene BVH into the aggregate memory of these nodes in parallel, and performs full path traced global illumination using an inter-node messaging service designed for communicating ray data. To balance ray tracing work across many nodes, R2E2 adopts a service-oriented design that statically replicates geometry and texture data from frequently traversed scene regions onto multiple nodes based on estimates of load, and dynamically assigns ray tracing work to lightly loaded nodes holding the required data. We port pbrt's ray-scene intersection components to the R2E2 architecture, and demonstrate that scenes with up to a terabyte of geometry and texture data (where as little as 1/250th of the scene can fit on any one node) can be path traced at 4K resolution, in tens of seconds using thousands of tiny serverless nodes on the AWS Lambda platform.
more » « less
Full Text Available
POSH: A Data-Aware Shell

Raghavan, Deepti; Fouladi, Sadjad; Levis, Philip; and Zaharia, Matei. (July 2020, USENIX ATC 2020)
null (Ed.)
We present POSH, a framework that accelerates shell applications with I/O-heavy components, such as data analytics with command-line utilities. Remote storage such as networked filesystems can severely limit the performance of these applications: data makes a round trip over the network for relatively little computation at the client. Reducing the data movement by moving the code to the data can improve performance. POSH automatically optimizes unmodified I/O-intensive shell applications running over remote storage by offloading the I/O-intensive portions to proxy servers closer to the data. A proxy can run directly on a storage server, or on a machine closer to the storage layer than the client. POSH intercepts shell pipelines and uses metadata called annotations to decide where to run each command within the pipeline. We address three principal challenges that arise: an annotation language that allows POSH to understand which files a command will access, a scheduling algorithm that places commands to minimize data movement, and a system runtime to execute a distributed schedule but retain local semantics. We benchmark POSH on real shell pipelines such as image processing, network security analysis, log analysis, distributed system debugging, and git. We find that POSH provides speedups ranging from 1.6× to 15× compared to NFS, without requiring any modifications to the applications.
more » « less
Full Text Available
Parallelization Techniques for Verifying Neural Networks

https://doi.org/10.34727/2020/isbn.978-3-85448-042-6_20

Wu, Haoze; Ozdemir, Alex; Zeljic, Aleksandar; Julian, Kyle; Irfan, Ahmed; Gopinath, Divya; Fouladi, Sadjad; Katz, Guy; Pasareanu, Corina; Barrett, Clark (September 2020, Proceedings of the 20th International Conference on Formal Methods In Computer-Aided Design (FMCAD '20))
Ivrii, Alexander; Strichman, Ofer (Ed.)
Inspired by recent successes of parallel techniques for solving Boolean satisfiability, we investigate a set of strategies and heuristics to leverage parallelism and improve the scalability of neural network verification. We present a general description of the Split-and-Conquer partitioning algorithm, implemented within the Marabou framework, and discuss its parameters and heuristic choices. In particular, we explore two novel partitioning strategies, that partition the input space or the phases of the neuron activations, respectively. We introduce a branching heuristic and a direction heuristic that are based on the notion of polarity. We also introduce a highly parallelizable pre-processing algorithm for simplifying neural network verification problems. An extensive experimental evaluation shows the benefit of these techniques on both existing and new benchmarks. A preliminary experiment ultra-scaling our algorithm using a large distributed cloud-based platform also shows promising results.
more » « less
Full Text Available
Learning in situ: a randomized experiment in video streaming

Yan, Francis Y.; Ayers, Hudson; Zhu, Chenzhi; Fouladi, Sadjad; Hong, James; Zhang, Keyi; Levis, Philip; Winstein, Keith (February 2020, 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20))

We describe the results of a randomized controlled trial of video-streaming algorithms for bitrate selection and network prediction. Over the last year, we have streamed 38.6 years of video to 63,508 users across the Internet. Sessions are randomized in blinded fashion among algorithms. We found that in this real-world setting, it is difficult for sophisticated or machine-learned control schemes to outperform a "simple" scheme (buffer-based control), notwithstanding good performance in network emulators or simulators. We performed a statistical analysis and found that the heavy-tailed nature of network and user behavior, as well as the challenges of emulating diverse Internet paths during training, present obstacles for learned algorithms in this setting. We then developed an ABR algorithm that robustly outperformed other schemes, by leveraging data from its deployment and limiting the scope of machine learning only to making predictions that can be checked soon after. The system uses supervised learning in situ, with data from the real deployment environment, to train a probabilistic predictor of upcoming chunk transmission times. This module then informs a classical control policy (model predictive control). To support further investigation, we are publishing an archive of data and results each week, and will open our ongoing study to the community. We welcome other researchers to use this platform to develop and validate new algorithms for bitrate selection, network prediction, and congestion control.
more » « less
Full Text Available
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers

Fouladi, Sadjad; Romero, Francisco; Iter, Dan; Li, Qian; Chatterjee, Shuvo; Kozyrakis, Christos; Zaharia, Matei; Winstein, Keith (July 2019, 2019 USENIX Annual Technical Conference (USENIX ATC 19))

We present gg, a framework and a set of command-line tools that helps people execute everyday applications—e.g., software compilation, unit tests, video encoding, or object recognition—using thousands of parallel threads on a cloud-functions service to achieve near-interactive completion time. In the future, instead of running these tasks on a laptop, or keeping a warm cluster running in the cloud, users might push a button that spawns 10,000 parallel cloud functions to execute a large job in a few seconds from start. gg is designed to make this practical and easy. With gg, applications express a job as a composition of lightweight OS containers that are individually transient (lifetimes of 1–60 seconds) and functional (each container is hermetically sealed and deterministic). gg takes care of instantiating these containers on cloud functions, loading dependencies, minimizing data movement, moving data between containers, and dealing with failure and stragglers. We ported several latency-sensitive applications to run on gg and evaluated its performance. In the best case, a distributed compiler built on gg outperformed a conventional tool (icecc) by 2–5×, without requiring a warm cluster running continuously. In the worst case, gg was within 20% of the hand-tuned performance of an existing tool for video encoding (ExCamera).
more » « less
Full Text Available
Secure serverless computing using dynamic information flow control

https://doi.org/10.1145/3276488

Alpernas, Kalev; Flanagan, Cormac; Fouladi, Sadjad; Ryzhyk, Leonid; Sagiv, Mooly; Schmitz, Thomas; Winstein, Keith (October 2018, Proceedings of the ACM on Programming Languages)

Full Text Available

Search for: All records