NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generating Cross-model Analytics Workloads Using LLMs

https://doi.org/10.1145/3627673.3679932

Zheng, Xiuwen; Kumar, Arun; Gupta, Amarnath (October 2024, ACM)

Full Text Available
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads

https://doi.org/10.14778/3636218.3636227

Nagrecha, Kabir; Kumar, Arun (December 2023, Proceedings of the VLDB Endowment)

Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. Such models must be trained on multiple GPUs due to their size and computational load, driving the development of a bevy of model parallelism techniques and tools. Navigating suchparallelismchoices, however, is a new burden for DL users such as data scientists, domain scientists, etc., who may lack the necessary systems knowhow. The need formodel selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens:resource apportioningandscheduling.In this work, we unify these three burdens by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, exploiting the performance opportunities presented by joint optimization. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than current DL practice.
more » « less
Full Text Available
Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines

https://doi.org/10.14778/3611479.3611483

Zhang, Yuhao; Kumar, Arun (July 2023, Proceedings of the VLDB Endowment)

Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that provide higher efficiency and schemes to reduce costs. However, we observe that many such systems basically reinvent the wheel of much work done in the database world on scalable graph analytics engines. Further, they often tightly couple the scalability treatments of graph data processing with that of GNN training, resulting in entangled complex problems and systems that often do not scale well on one of those axes. In this paper, we ask a fundamental question: How far can we push existing systems for scalable graph analytics and deep learning (DL) instead of building custom GNN systems? Are compromises inevitable on scalability and/or runtimes? We propose Lotan, the first scalable and optimized data system for full-batch GNN training withdecoupled scalingthat bridges the hitherto siloed worlds of graph analytics systems and DL systems. Lotan offers a series of technical innovations, including re-imagining GNN training as query plan-like dataflows, execution plan rewriting, optimized data movement between systems, a GNN-centric graph partitioning scheme, and the first known GNN model batching scheme. We prototyped Lotan on top of GraphX and PyTorch. An empirical evaluation using several real-world benchmark GNN workloads reveals a promising nuanced picture: Lotan significantly surpasses the scalability of state-of-the-art custom GNN systems, while often matching or being only slightly behind on time-to-accuracy metrics in some cases. We also show the impact of our system optimizations. Overall, our work shows that the GNN world can indeed benefit from building on top of scalable graph analytics engines. Lotan's new level of scalability can also empower new ML-oriented research on ever-larger graphs and GNNs.
more » « less
Full Text Available
CHAP-child: an open source method for estimating sit-to-stand transitions and sedentary bout patterns from hip accelerometers among children

https://doi.org/10.1186/s12966-022-01349-2

Carlson, Jordan A.; Ridgers, Nicola D.; Nakandala, Supun; Zablocki, Rong; Tuz-Zahra, Fatima; Bellettiere, John; Hibbing, Paul R.; Steel, Chelsea; Jankowska, Marta M.; Rosenberg, Dori E.; et al (December 2022, International Journal of Behavioral Nutrition and Physical Activity)

Abstract Background Hip-worn accelerometer cut-points have poor validity for assessing children’s sedentary time, which may partly explain the equivocal health associations shown in prior research. Improved processing/classification methods for these monitors would enrich the evidence base and inform the development of more effective public health guidelines. The present study aimed to develop and evaluate a novel computational method (CHAP-child) for classifying sedentary time from hip-worn accelerometer data. Methods Participants were 278, 8–11-year-olds recruited from nine primary schools in Melbourne, Australia with differing socioeconomic status. Participants concurrently wore a thigh-worn activPAL (ground truth) and hip-worn ActiGraph (test measure) during up to 4 seasonal assessment periods, each lasting up to 8 days. activPAL data were used to train and evaluate the CHAP-child deep learning model to classify each 10-s epoch of raw ActiGraph acceleration data as sitting or non-sitting, creating comparable information from the two monitors. CHAP-child was evaluated alongside the current practice 100 counts per minute (cpm) method for hip-worn ActiGraph monitors. Performance was tested for each 10-s epoch and for participant-season level sedentary time and bout variables (e.g., mean bout duration). Results Across participant-seasons, CHAP-child correctly classified each epoch as sitting or non-sitting relative to activPAL, with mean balanced accuracy of 87.6% (SD = 5.3%). Sit-to-stand transitions were correctly classified with mean sensitivity of 76.3% (SD = 8.3). For most participant-season level variables, CHAP-child estimates were within ± 11% (mean absolute percent error [MAPE]) of activPAL, and correlations between CHAP-child and activPAL were generally very large (> 0.80). For the current practice 100 cpm method, most MAPEs were greater than ± 30% and most correlations were small or moderate (≤ 0.60) relative to activPAL. Conclusions There was strong support for the concurrent validity of the CHAP-child classification method, which allows researchers to derive activPAL-equivalent measures of sedentary time, sit-to-stand transitions, and sedentary bout patterns from hip-worn triaxial ActiGraph data. Applying CHAP-child to existing datasets may provide greater insights into the potential impacts and influences of sedentary time in children.
more » « less
Full Text Available
CHAP-Adult: A Reliable and Valid Algorithm to Classify Sitting and Measure Sitting Patterns Using Data From Hip-Worn Accelerometers in Adults Aged 35+

John Bellettiere; Supun Nakandala; Fatima Tuz-Zahra; Elisabeth A.H. Winkler; Paul R. Hibbing; Genevieve N. Healy; David W. Dunstan; Neville Owen; Mikael Anne Greenwood-Hickman; Dori E. Rosenberg; et al (September 2022, Journal for the measurement of physical behaviour)

Hip-worn accelerometers are commonly used, but data processed using the 100 counts per minute cut point do not accurately measure sitting patterns. We developed and validated a model to accurately classify sitting and sitting patterns using hip-worn accelerometer data from a wide age range of older adults. Methods: Deep learning models were trained with 30-Hz triaxial hip-worn accelerometer data as inputs and activPAL sitting/nonsitting events as ground truth. Data from 981 adults aged 35–99 years from cohorts in two continents were used to train the model, which we call CHAP-Adult (Convolutional Neural Network Hip Accelerometer Posture-Adult). Validation was conducted among 419 randomly selected adults not included in model training. Results: Mean errors (activPAL − CHAP-Adult) and 95% limits of agreement were: sedentary time −10.5 (−63.0, 42.0) min/day, breaks in sedentary time 1.9 (−9.2, 12.9) breaks/day, mean bout duration −0.6 (−4.0, 2.7) min, usual bout duration −1.4 (−8.3, 5.4) min, alpha .00 (−.04, .04), and time in ≥30-min bouts −15.1 (−84.3, 54.1) min/day. Respective mean (and absolute) percent errors were: −2.0% (4.0%), −4.7% (12.2%), 4.1% (11.6%), −4.4% (9.6%), 0.0% (1.4%), and 5.4% (9.6%). Pearson’s correlations were: .96, .92, .86, .92, .78, and .96. Error was generally consistent across age, gender, and body mass index groups with the largest deviations observed for those with body mass index ≥30 kg/m2. Conclusions: Overall, these strong validation results indicate CHAP-Adult represents a significant advancement in the ambulatory measurement of sitting and sitting patterns using hip-worn accelerometers. Pending external validation, it could be widely applied to data from around the world to extend understanding of the epidemiology and health consequences of sitting.
more » « less
Full Text Available
Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

https://doi.org/10.1145/3514221.3517846

Nakandala, Supun; Kumar, Arun (June 2022, Proceedings of the 2022 International Conference on Management of Data)

Full Text Available
Intermittent human-in-the-loop model selection using cerebro: a demonstration

https://doi.org/10.14778/3476311.3476320

Li, Liangde; Nakandala, Supun; Kumar, Arun (July 2021, Proceedings of the VLDB Endowment)

Deep learning (DL) is revolutionizing many fields. However, there is a major bottleneck for the wide adoption of DL: the pain of model selection , which requires exploring a large config space of model architecture and training hyper-parameters before picking the best model. The two existing popular paradigms for exploring this config space pose a false dichotomy. AutoML-based model selection explores configs with high-throughput but uses human intuition minimally. Alternatively, interactive human-in-the-loop model selection completely relies on human intuition to explore the config space but often has very low throughput. To mitigate the above drawbacks, we propose a new paradigm for model selection that we call intermittent human-in-the-loop model selection . In this demonstration, we will showcase our approach using five real-world DL model selection workloads. A short video of our demonstration can be found here: https://youtu.be/K3THQy5McXc.
more » « less
Full Text Available
Towards an optimized GROUP by abstraction for large-scale machine learning

https://doi.org/10.14778/3476249.3476284

Li, Side; Kumar, Arun (July 2021, Proceedings of the VLDB Endowment)

Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.
more » « less
Full Text Available
Distributed deep learning on data systems: a comparative analysis of approaches

https://doi.org/10.14778/3467861.3467867

Zhang, Yuhao; McQuillan, Frank; Jayaram, Nandish; Kak, Nikhil; Khanna, Ekta; Kislal, Orhan; Valdano, Domino; Kumar, Arun (June 2021, Proceedings of the VLDB Endowment)

Deep learning (DL) is growing in popularity for many data analytics applications, including among enterprises. Large business-critical datasets in such settings typically reside in RDBMSs or other data systems. The DB community has long aimed to bring machine learning (ML) to DBMS-resident data. Given past lessons from in-DBMS ML and recent advances in scalable DL systems, DBMS and cloud vendors are increasingly interested in adding more DL support for DB-resident data. Recently, a new parallel DL model selection execution approach called Model Hopper Parallelism (MOP) was proposed. In this paper, we characterize the particular suitability of MOP for DL on data systems, but to bring MOP-based DL to DB-resident data, we show that there is no single "best" approach, and an interesting tradeoff space of approaches exists. We explain four canonical approaches and build prototypes upon Greenplum Database, compare them analytically on multiple criteria (e.g., runtime efficiency and ease of governance) and compare them empirically with large-scale DL workloads. Our experiments and analyses show that it is non-trivial to meet all practical desiderata well and there is a Pareto frontier; for instance, some approaches are 3x-6x faster but fare worse on governance and portability. Our results and insights can help DBMS and cloud vendors design better DL support for DB users. All of our source code, data, and other artifacts are available at https://github.com/makemebitter/cerebro-ds.
more » « less
Full Text Available
Cerebro: A Layered Data Platform for Scalable Deep Learning

Kumar, Arun; Nakandala, Supun; Zhang, Yuhao; Li, Side; Gemawat, Advitya; Nagrecha, Kabir (January 2021, 11th Annual Conference on Innovative Data Systems Research (CIDR '21))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records