NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

https://doi.org/10.1145/3620665.3640399

Chen, Hongzheng; Yu, Cody Hao; Zheng, Shuai; Zhang, Zhen; Zhang, Zhiru; Wang, Yida (April 2024, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'2024))

Full Text Available
Serving Deep Learning Models from Relational Databases

https://doi.org/10.48786/edbt.2024.61

Zhou, Lixi; Lin, Qi; Chowdhury, Kanchan; Masood, Saif; Eichenberger, Alexandre; Min, Hong; Sim, Alexander; Wang, Jie; Wang, Yida; Wu, Kesheng; et al (January 2024, OpenProceedings.org)

Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains, sparking growing interest recently. In this visionary paper, we embark on a comprehensive exploration of representative architectures to address the requirement. We highlight three pivotal paradigms: The state-of-the-art \textit{DL-centric} architecture offloads DL computations to dedicated DL frameworks. The potential \textit{UDF-centric} architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS). The potential \textit{relation-centric} architecture aims to represent a large-scale tensor computation through relational operators. While each of these architectures demonstrates promise in specific use scenarios, we identify urgent requirements for seamless integration of these architectures and the middle ground in-between these architectures. We delve into the gaps that impede the integration and explore innovative strategies to close them. We present a pathway to establish a novel RDBMS for enabling a broad class of data-intensive DL inference applications.
more » « less
UNIT: Unifying Tensorized Instruction Compilation

https://doi.org/10.1109/CGO51591.2021.9370330

Weng, Jian; Jain, Animesh; Wang, Jie; Wang, Leyuan; Wang, Yida; Nowatzki, Tony (February 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO))
null (Ed.)
Because of the increasing demand for intensive computation in deep neural networks, researchers have developed both hardware and software mechanisms to reduce the compute and memory burden. A widely adopted approach is to use mixed precision data types. However, it is hard to benefit from mixed precision without hardware specialization because of the overhead of data casting. Recently, hardware vendors offer tensorized instructions specialized for mixed-precision tensor operations, such as Intel VNNI, Nvidia Tensor Core, and ARM DOT. These instructions involve a new computing idiom, which reduces multiple low precision elements into one high precision element. The lack of compilation techniques for this emerging idiom makes it hard to utilize these instructions. In practice, one approach is to use vendor-provided libraries for computationally-intensive kernels, but this is inflexible and prevents further optimizations. Another approach is to manually write hardware intrinsics, which is error-prone and difficult for programmers. Some prior works tried to address this problem by creating compilers for each instruction. This requires excessive efforts when it comes to many tensorized instructions. In this work, we develop a compiler framework, UNIT, to unify the compilation for tensorized instructions. The key to this approach is a unified semantics abstraction which makes the integration of new instructions easy, and the reuse of the analysis and transformations possible. Tensorized instructions from different platforms can be compiled via UNIT with moderate effort for favorable performance. Given a tensorized instruction and a tensor operation, UNIT automatically detects the applicability of the instruction, transforms the loop organization of the operation, and rewrites the loop body to take advantage of the tensorized instruction. According to our evaluation, UNIT is able to target various mainstream hardware platforms. The generated end-to-end inference model achieves 1.3 x speedup over Intel oneDNN on an x86 CPU, 1.75x speedup over Nvidia cuDNN on an Nvidia GPU, and 1.13x speedup over a carefully tuned TVM solution for ARM DOT on an ARM CPU.
more » « less
Full Text Available
Is Network the Bottleneck of Distributed Training?

https://doi.org/10.1145/3405671.3405810

Zhang, Zhen; Chang, Chaokun; Lin, Haibin; Wang, Yida; Arora, Raman; Jin, Xin (August 2020, ACM SIGCOMM Workshop on Network Meets AI & ML (NetAI))

Full Text Available
BrainIAK: The Brain Imaging Analysis Kit

https://doi.org/10.52294/31bb5b68-2184-411b-8c00-a1dacb61e1da

Kumar, Manoj; Anderson, Michael J.; Antony, James W.; Baldassano, Christopher; Brooks, Paula P.; Cai, Ming Bo; Chen, Po-Hsuan Cameron; Ellis, Cameron T.; Henselman-Petrusek, Gregory; Huberdeau, David; et al (January 2021, Aperture Neuro)

Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEMs), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high-performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.
more » « less
Full Text Available

Search for: All records