NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads

https://doi.org/10.14778/3636218.3636227

Nagrecha, Kabir; Kumar, Arun (December 2023, Proceedings of the VLDB Endowment)

Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. Such models must be trained on multiple GPUs due to their size and computational load, driving the development of a bevy of model parallelism techniques and tools. Navigating suchparallelismchoices, however, is a new burden for DL users such as data scientists, domain scientists, etc., who may lack the necessary systems knowhow. The need formodel selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens:resource apportioningandscheduling.In this work, we unify these three burdens by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, exploiting the performance opportunities presented by joint optimization. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than current DL practice.
more » « less
Full Text Available
Incremental and Approximate Computations for Accelerating Deep CNN Inference

https://doi.org/10.1145/3397461

Nakandala, Supun; Nagrecha, Kabir; Kumar, Arun; Papakonstantinou, Yannis (December 2020, ACM Transactions on Database Systems)
null (Ed.)
Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries , we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different . We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.
more » « less
Full Text Available
Cerebro: A Layered Data Platform for Scalable Deep Learning

Kumar, Arun; Nakandala, Supun; Zhang, Yuhao; Li, Side; Gemawat, Advitya; Nagrecha, Kabir (January 2021, 11th Annual Conference on Innovative Data Systems Research (CIDR '21))
null (Ed.)
Full Text Available

Search for: All records