skip to main content

Search for: All records

Creators/Authors contains: "Kumar, Arun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. Such models must be trained on multiple GPUs due to their size and computational load, driving the development of a bevy of model parallelism techniques and tools. Navigating suchparallelismchoices, however, is a new burden for DL users such as data scientists, domain scientists, etc., who may lack the necessary systems knowhow. The need formodel selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens:resource apportioningandscheduling.In this work, we unify these three burdens by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, exploiting the performance opportunities presented by joint optimization. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than current DL practice. 
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that provide higher efficiency and schemes to reduce costs. However, we observe that many such systems basically reinvent the wheel of much work done in the database world on scalable graph analytics engines. Further, they often tightly couple the scalability treatments of graph data processing with that of GNN training, resulting in entangled complex problems and systems that often do not scale well on one of those axes.

    In this paper, we ask a fundamental question: How far can we push existing systems for scalable graph analytics and deep learning (DL) instead of building custom GNN systems? Are compromises inevitable on scalability and/or runtimes? We propose Lotan, the first scalable and optimized data system for full-batch GNN training withdecoupled scalingthat bridges the hitherto siloed worlds of graph analytics systems and DL systems. Lotan offers a series of technical innovations, including re-imagining GNN training as query plan-like dataflows, execution plan rewriting, optimized data movement between systems, a GNN-centric graph partitioning scheme, and the first known GNN model batching scheme. We prototyped Lotan on top of GraphX and PyTorch. An empirical evaluation using several real-world benchmark GNN workloads reveals a promising nuanced picture: Lotan significantly surpasses the scalability of state-of-the-art custom GNN systems, while often matching or being only slightly behind on time-to-accuracy metrics in some cases. We also show the impact of our system optimizations. Overall, our work shows that the GNN world can indeed benefit from building on top of scalable graph analytics engines. Lotan's new level of scalability can also empower new ML-oriented research on ever-larger graphs and GNNs. 

    more » « less
  3. Editors: Bartow-Gillies, E ; Blunden, J. ; Boyer, T. Chapter Editors: (Ed.)
    Free, publicly-accessible full text available September 1, 2024
  4. Abstract Background Hip-worn accelerometer cut-points have poor validity for assessing children’s sedentary time, which may partly explain the equivocal health associations shown in prior research. Improved processing/classification methods for these monitors would enrich the evidence base and inform the development of more effective public health guidelines. The present study aimed to develop and evaluate a novel computational method (CHAP-child) for classifying sedentary time from hip-worn accelerometer data. Methods Participants were 278, 8–11-year-olds recruited from nine primary schools in Melbourne, Australia with differing socioeconomic status. Participants concurrently wore a thigh-worn activPAL (ground truth) and hip-worn ActiGraph (test measure) during up to 4 seasonal assessment periods, each lasting up to 8 days. activPAL data were used to train and evaluate the CHAP-child deep learning model to classify each 10-s epoch of raw ActiGraph acceleration data as sitting or non-sitting, creating comparable information from the two monitors. CHAP-child was evaluated alongside the current practice 100 counts per minute (cpm) method for hip-worn ActiGraph monitors. Performance was tested for each 10-s epoch and for participant-season level sedentary time and bout variables (e.g., mean bout duration). Results Across participant-seasons, CHAP-child correctly classified each epoch as sitting or non-sitting relative to activPAL, with mean balanced accuracy of 87.6% (SD = 5.3%). Sit-to-stand transitions were correctly classified with mean sensitivity of 76.3% (SD = 8.3). For most participant-season level variables, CHAP-child estimates were within ± 11% (mean absolute percent error [MAPE]) of activPAL, and correlations between CHAP-child and activPAL were generally very large (> 0.80). For the current practice 100 cpm method, most MAPEs were greater than ± 30% and most correlations were small or moderate (≤ 0.60) relative to activPAL. Conclusions There was strong support for the concurrent validity of the CHAP-child classification method, which allows researchers to derive activPAL-equivalent measures of sedentary time, sit-to-stand transitions, and sedentary bout patterns from hip-worn triaxial ActiGraph data. Applying CHAP-child to existing datasets may provide greater insights into the potential impacts and influences of sedentary time in children. 
    more » « less
  5. Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP. 
    more » « less
  6. Deep learning (DL) is revolutionizing many fields. However, there is a major bottleneck for the wide adoption of DL: the pain of model selection , which requires exploring a large config space of model architecture and training hyper-parameters before picking the best model. The two existing popular paradigms for exploring this config space pose a false dichotomy. AutoML-based model selection explores configs with high-throughput but uses human intuition minimally. Alternatively, interactive human-in-the-loop model selection completely relies on human intuition to explore the config space but often has very low throughput. To mitigate the above drawbacks, we propose a new paradigm for model selection that we call intermittent human-in-the-loop model selection . In this demonstration, we will showcase our approach using five real-world DL model selection workloads. A short video of our demonstration can be found here: 
    more » « less
  7. Background : Hip-worn accelerometers are commonly used, but data processed using the 100 counts per minute cut point do not accurately measure sitting patterns. We developed and validated a model to accurately classify sitting and sitting patterns using hip-worn accelerometer data from a wide age range of older adults. Methods : Deep learning models were trained with 30-Hz triaxial hip-worn accelerometer data as inputs and activPAL sitting/nonsitting events as ground truth. Data from 981 adults aged 35–99 years from cohorts in two continents were used to train the model, which we call CHAP-Adult (Convolutional Neural Network Hip Accelerometer Posture-Adult). Validation was conducted among 419 randomly selected adults not included in model training. Results : Mean errors (activPAL − CHAP-Adult) and 95% limits of agreement were: sedentary time −10.5 (−63.0, 42.0) min/day, breaks in sedentary time 1.9 (−9.2, 12.9) breaks/day, mean bout duration −0.6 (−4.0, 2.7) min, usual bout duration −1.4 (−8.3, 5.4) min, alpha .00 (−.04, .04), and time in ≥30-min bouts −15.1 (−84.3, 54.1) min/day. Respective mean (and absolute) percent errors were: −2.0% (4.0%), −4.7% (12.2%), 4.1% (11.6%), −4.4% (9.6%), 0.0% (1.4%), and 5.4% (9.6%). Pearson’s correlations were: .96, .92, .86, .92, .78, and .96. Error was generally consistent across age, gender, and body mass index groups with the largest deviations observed for those with body mass index ≥30 kg/m 2 . Conclusions : Overall, these strong validation results indicate CHAP-Adult represents a significant advancement in the ambulatory measurement of sitting and sitting patterns using hip-worn accelerometers. Pending external validation, it could be widely applied to data from around the world to extend understanding of the epidemiology and health consequences of sitting. 
    more » « less
  8. Deep learning (DL) is growing in popularity for many data analytics applications, including among enterprises. Large business-critical datasets in such settings typically reside in RDBMSs or other data systems. The DB community has long aimed to bring machine learning (ML) to DBMS-resident data. Given past lessons from in-DBMS ML and recent advances in scalable DL systems, DBMS and cloud vendors are increasingly interested in adding more DL support for DB-resident data. Recently, a new parallel DL model selection execution approach called Model Hopper Parallelism (MOP) was proposed. In this paper, we characterize the particular suitability of MOP for DL on data systems, but to bring MOP-based DL to DB-resident data, we show that there is no single "best" approach, and an interesting tradeoff space of approaches exists. We explain four canonical approaches and build prototypes upon Greenplum Database, compare them analytically on multiple criteria (e.g., runtime efficiency and ease of governance) and compare them empirically with large-scale DL workloads. Our experiments and analyses show that it is non-trivial to meet all practical desiderata well and there is a Pareto frontier; for instance, some approaches are 3x-6x faster but fare worse on governance and portability. Our results and insights can help DBMS and cloud vendors design better DL support for DB users. All of our source code, data, and other artifacts are available at 
    more » « less