skip to main content


Title: Machine Learning Augmentation in Micro-Sensor Assemblies
Abstract

The size and power limitations in small electronic systems such as wearable devices limit their potential. Significant energy is lost utilizing current computational schemes in processes such as analog-to-digital conversion and wireless communication for cloud computing. Edge computing, where information is processed near the data sources, was shown to significantly enhance the performance of computational systems and reduce their power consumption. In this work, we push computation directly into the sensory node by presenting the use of an array of electrostatic Microelectromechanical systems (MEMS) sensors to perform colocalized sensing-and-computing. The MEMS network is operated around the pull-in regime to access the instability jump and the hysteresis available in this regime. Within this regime, the MEMS network is capable of emulating the response of the continuous-time recurrent neural network (CTRNN) computational scheme. The network is shown to be successful at classifying a quasi-static input acceleration waveform into square or triangle signals in the absence of digital processors. Our results show that the MEMS may be a viable solution for edge computing implementation without the need for digital electronics or micro-processors. Moreover, our results can be used as a basis for the development of new types of specialized MEMS sensors (ex: gesture recognition sensors).

 
more » « less
Award ID(s):
1935641
NSF-PAR ID:
10279969
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
14th International Conference on Micro- and Nanosystems (MNS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The goal of this paper is to provide a novel computing approach that can be used to reduce the power consumption, size, and cost of wearable electronics. To achieve this goal, the use of microelectromechanical systems (MEMS) sensors for simultaneous sensing and computing is introduced. Specifically, by enabling sensing and computing locally at the MEMS sensor node and utilizing the usually unwanted pull in/out hysteresis, we may eliminate the need for cloud computing and reduce the use of analog-to-digital converters, sampling circuits, and digital processors. As a proof of concept, we show that a simulation model of a network of three commercially available MEMS accelerometers can classify a train of square and triangular acceleration signals inherently using pull-in and release hysteresis. Furthermore, we develop and fabricate a network with finger arrays of parallel plate actuators to facilitate coupling between MEMS devices in the network using actuating assemblies and biasing assemblies, thus bypassing the previously reported coupling challenge in MEMS neural networks. 
    more » « less
  2. This paper presents an energy-efficient classification framework that performs human activity recognition (HAR). Typically, HAR classification tasks require a computational platform that includes a processor and memory along with sensors and their interfaces, all of which consume significant power. The presented framework employs microelectromechanical systems (MEMS) based Continuous Time Recurrent Neural Network (CTRNN) to perform HAR tasks very efficiently. In a real physical implementation, we show that the MEMS-CTRNN nodes can perform computing while consuming power on a nano-watts scale compared to the micro-watts state-of-the-art hardware. We also confirm that this huge power reduction doesn't come at the expense of reduced performance by evaluating its accuracy to classify the highly cited human activity recognition dataset (HAPT). Our simulation results show that the HAR framework that consists of a training module, and a network of MEMS-based CTRNN nodes, provides HAR classification accuracy for the HAPT that is comparable to traditional CTRNN and other Recurrent Neural Network (RNN) implantations. For example, we show that the MEMS-based CTRNN model average accuracy for the worst-case scenario of not using pre-processing techniques, such as quantization, to classify 5 different activities is 77.94% compared to 78.48% using the traditional CTRNN. 
    more » « less
  3. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  4. Abstract

    With the rapid development of the Internet of Things (IoT), network security challenges are becoming more and more complex, and the scale of intrusion attacks against the network is gradually increasing. Therefore, researchers have proposed Intrusion Detection Systems and constantly designed more effective systems to defend against attacks. One issue to consider is using limited computing power to process complex network data efficiently. In this paper, we take the AWID dataset as an example, propose an efficient data processing method to mitigate the interference caused by redundant data and design a lightweight deep learning-based model to analyze and predict the data category. Finally, we achieve an overall accuracy of 99.77% and an accuracy of 97.95% for attacks on the AWID dataset, with a detection rate of 99.98% for the injection attack. Our model has low computational overhead and a fast response time after training, ensuring the feasibility of applying to edge nodes with weak computational power in the IoT.

     
    more » « less
  5. Abstract Solving linear systems, often accomplished by iterative algorithms, is a ubiquitous task in science and engineering. To accommodate the dynamic range and precision requirements, these iterative solvers are carried out on floating-point processing units, which are not efficient in handling large-scale matrix multiplications and inversions. Low-precision, fixed-point digital or analog processors consume only a fraction of the energy per operation than their floating-point counterparts, yet their current usages exclude iterative solvers due to the cumulative computational errors arising from fixed-point arithmetic. In this work, we show that for a simple iterative algorithm, such as Richardson iteration, using a fixed-point processor can provide the same convergence rate and achieve solutions beyond its native precision when combined with residual iteration. These results indicate that power-efficient computing platforms consisting of analog computing devices can be used to solve a broad range of problems without compromising the speed or precision. 
    more » « less