skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MagmaDNN: Accelerated Deep Learning Using MAGMA
MagmaDNN [17] is a deep learning framework driven using the highly optimized MAGMA dense linear algebra package. The library offers comparable performance to other popular frameworks, such as TensorFlow, PyTorch, and Theano. C++ is used to implement the framework providing fast memory operations, direct cuda access, and compile time errors. Common neural network layers such as Fully Connected, Convolutional, Pooling, Flatten, and Dropout are included. Hyperparameter tuning is performed with a parallel grid search engine. MagmaDNN uses several techniques to accelerate network training. For instance, convolutions are performed using the Winograd algorithm and FFTs. Other techniques include MagmaDNNs custom memory manager, which is used to reduce expensive memory transfers, and accelerated training by distributing batches across GPU nodes. This paper provides an overview of the MagmaDNN framework and how it leverages the MAGMA library to attain speed increases. This paper also addresses how deep networks are accelerated by training in parallel and further challenges with parallelization.  more » « less
Award ID(s):
1659502
PAR ID:
10143136
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
PEARC19
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available. 
    more » « less
  2. In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available. 
    more » « less
  3. Processing in-memory (PIM) promises to unleash unprecedented computing capabilities for high-data-rate applications. Computation using PIM is performed by breaking down computationally expensive operations into in-memory kernels that can be efficiently executed using non-volatile memory. Logic styles such as MAGIC require that each output memory cell is prepared for evaluation before executing the functional logic operation. State-of-the-art synthesis algorithms perform the preparation immediately after memory cells have expired. Unfortunately, this results in columns of cells being prepared greedily, instead of leveraging efficient parallel data preparation instructions. In this paper, we propose the PREP framework that maximizes the opportunities for parallel column preparation using execution sequence optimization. The key idea of the framework is to postpone data preparation instructions until there are no available prepared cells. Next, the accumulated memory cells are prepared in parallel to release the memory for functional evaluations. The framework is capable of exploring a frontier of area-performance solutions. The PREP framework is evaluated using 15 benchmarks from the SuiteSparse library. Compared with state-of-the-art synthesis tools, energy consumption and latency are respectively reduced by 27% and 25%, with no additional cost in crossbar memory. 
    more » « less
  4. null (Ed.)
    Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline , i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data pre-processing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time : time spent waiting for data to be fetched and pre-processed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5X on a single server). 
    more » « less
  5. Unmanned Aerial Networks (UAVs) are prone to several cyber-attacks, including Global Positioning Spoofing attacks. For this purpose, numerous studies have been conducted to detect, classify, and mitigate these attacks, using Artificial Intelligence techniques; however, most of these studies provided techniques with low detection, high misdetection, and high bias rates. To fill this gap, in this paper, we propose three supervised deep learning techniques, namely Deep Neural Network, U Neural Network, and Long Short Term Memory. These models are evaluated in terms of Accuracy, Detection Rate, Misdetection Rate, False Alarm Rate, Training Time per Sample, Prediction Time, and Memory Size. The simulation results indicated that the U Neural Network outperforms other models with an accuracy of 98.80%, a probability of detection of 98.85%, a misdetection of 1.15%, a false alarm of 1.8%, a training time per sample of 0.22 seconds, a prediction time of 0.2 seconds, and a memory size of 199.87 MiB. In addition, these results depicted that the Long Short-Term Memory model provides the lowest performance among other models for detecting these attacks on UAVs. 
    more » « less