skip to main content


Title: MLPerf Tiny Benchmark
Advancements in ultra-low-power tiny machine learning (TinyML) systems promise to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted and easily reproducible benchmark for these systems. To meet this need, we present MLPerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems. The benchmark suite is the collaborative effort of more than 50 organizations from industry and academia and reflects the needs of the community. MLPerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems. Additionally, MLPerf Tiny implements a modular design that enables benchmark submitters to show the benefits of their product, regardless of where it falls on the ML deployment stack, in a fair and reproducible manner. The suite features four benchmarks: keyword spotting, visual wake words, image classification, and anomaly detection.  more » « less
Award ID(s):
1904444
NSF-PAR ID:
10300102
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; « less
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf ™ is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications, driven by the MLCommons ™ Association. We present the results from the first submission round including a diverse set of some of the world’s largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization and communication scheduling enabling overall >10× (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system’s memory hierarchy and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch-sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O and network behaviour to parameterize extended roofline performance models in future rounds. 
    more » « less
  2. Transfer learning, where the goal is to transfer the well-trained deep learning models from a primary source task to a new task, is a crucial learning scheme for on-device machine learning, due to the fact that IoT/edge devices collect and then process massive data in our daily life. However, due to the tiny memory constraint in IoT/edge devices, such on-device learning requires ultra-small training memory footprint, bringing new challenges for memory-efficient learning. Many existing works solve this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory-saving since the major bottleneck is the activations, not parameters. To develop memory-efficient on-device transfer learning, in this work, we are the first to approach the concept of transfer learning from a new perspective of intermediate feature reprogramming of a pre-trained model (i.e., backbone). To perform this lightweight and memory-efficient reprogramming, we propose to train a tiny Reprogramming Network (Rep-Net) directly from the new task input data, while freezing the backbone model. The proposed Rep-Net model interchanges the features with the backbone model using an activation connector at regular intervals to mutually benefit both the backbone model and Rep-Net model features. Through extensive experiments, we validate each design specs of the proposed Rep-Net model in achieving highly memory-efficient on-device reprogramming. Our experiments establish the superior performance (i.e., low training memory and high accuracy) of Rep-Net compared to SOTA on-device transfer learning schemes across multiple benchmarks. 
    more » « less
  3. While the global healthcare market of wearable devices has been growing significantly in recent years and is predicted to reach $60 billion by 2028, many important healthcare applications such as seizure monitoring, drowsiness detection, etc. have not been deployed due to the limited battery lifetime, slow response rate, and inadequate biosignal quality.This study proposes PROS, an efficient pattern-driven compressive sensing framework for low-power biopotential-based wearables. PROS eliminates the conventional trade-off between signal quality, response time, and power consumption by introducing tiny pattern recognition primitives and a pattern-driven compressive sensing technique that exploits the sparsity of biosignals. Specifically, we (i) develop tiny machine learning models to eliminate irrelevant biosignal patterns, (ii) efficiently perform compressive sampling of relevant biosignals with appropriate sparse wavelet domains, and (iii) optimize hardware and OS operations to push processing efficiency. PROS also provides an abstraction layer, so the application only needs to care about detected relevant biosignal patterns without knowing the optimizations underneath.We have implemented and evaluated PROS on two open biosignal datasets with 120 subjects and six biosignal patterns. The experimental results on unknown subjects of a practical use case such as epileptic seizure monitoring are very encouraging. PROS can reduce the streaming data rate by 24X while maintaining high fidelity signal. It boosts the power efficiency of the wearable device by more than 1200\% and enables the ability to react to critical events immediately on the device. The memory and runtime overheads of PROS are minimal, with a few KBs and 10s of milliseconds for each biosignal pattern, respectively. PROS is currently adopted in research projects in multiple universities and hospitals. 
    more » « less
  4. This paper presents a design approach for the modeling and simulation of ultra-low power (ULP) analog computing machine learning (ML) circuits for seizure detection using EEG signals in wearable health monitoring applications. In this paper, we describe a new analog system modeling and simulation technique to associate power consumption, noise, linearity, and other critical performance parameters of analog circuits with the classification accuracy of a given ML network, which allows to realize a power and performance optimized analog ML hardware implementation based on diverse application-specific needs. We carried out circuit simulations to obtain non-idealities, which are then mathematically modeled for an accurate mapping. We have modeled noise, non-linearity, resolution, and process variations such that the model can accurately obtain the classification accuracy of the analog computing based seizure detection system. Noise has been modeled as an input-referred white noise that can be directly added at the input. Device process and temperature variations were modeled as random fluctuations in circuit parameters such as gain and cut-off frequency. Nonlinearity was mathematically modeled as a power series. The combined system level model was then simulated for classification accuracy assessments. The design approach helps to optimize power and area during the development of tailored analog circuits for ML networks with the ability to potentially trade power and performance goals while still ensuring the required classification accuracy. The simulation technique also enables to determine target specifications for each circuit block in the analog computing hardware. This is achieved by developing the ML hardware model, and investigating the effect of circuit nonidealities on classification accuracy. Simulation of an analog computing EEG seizure detection block shows a classification accuracy of 91%. The proposed modeling approach will significantly reduce design time and complexity of large analog computing systems. Two feature extraction approaches are also compared for an analog computing architecture. 
    more » « less
  5. Abstract

    The pork industry is an essential part of the global food system, providing a significant source of protein for people around the world. A major factor restraining productivity and compromising animal wellbeing in the pork industry is disease outbreaks in pigs throughout the production process: widespread outbreaks can lead to losses as high as 10% of the U.S. pig population in extreme years. In this study, we present a machine learning model to predict the emergence of infection in swine production systems throughout the production process on a daily basis, a potential precursor to outbreaks whose detection is vital for disease prevention and mitigation. We determine features that provide the most value in predicting infection, which include nearby farm density, historical test rates, piglet inventory, feed consumption during the gestation period, and wind speed and direction. We utilize these features to produce a generalizable machine learning model, evaluate the model’s ability to predict outbreaks both seven and 30 days in advance, allowing for early warning of disease infection, and evaluate our model on two swine production systems and analyze the effects of data availability and data granularity in the context of our two swine systems with different volumes of data. Our results demonstrate good ability to predict infection in both systems with a balanced accuracy of$$85.3\%$$85.3%on any disease in the first system and balanced accuracies (average prediction accuracy on positive and negative samples) of$$58.5\%$$58.5%,$$58.7\%$$58.7%,$$72.8\%$$72.8%and$$74.8\%$$74.8%on porcine reproductive and respiratory syndrome, porcine epidemic diarrhea virus, influenza A virus, andMycoplasma hyopneumoniaein the second system, respectively, using the six most important predictors in all cases. These models provide daily infection probabilities that can be used by veterinarians and other stakeholders as a benchmark to more timely support preventive and control strategies on farms.

     
    more » « less