In this paper, a multilayer perceptron (MLP)-type artificial neural network model with a back-propagation training algorithm is utilized to model the bubble growth and bubble dynamics parameters in nucleate boiling with a non-uniform electric field. The influences of the electric field on different parameters that describe bubble’s behaviors including bubble waiting time, bubble departure frequency, bubble growth time, and bubble departure diameter are considered. This study models single bubble dynamic behaviors of R113 created on a heater in an inconsistent electric field by utilizing a MLP neural network optimized by four different swarm-based optimization algorithms, namely: Salp Swarm Algorithm (SSA), Grey Wolf Optimizer (GWO), Artificial Bee Colony (ABC) algorithm, and Particle Swarm Optimization (PSO). For evaluating the model effectiveness, the MSE value (Mean-Square Error) of the artificial neural network model with various optimization algorithms is measured and compared. The results suggest that the optimal networks in the two-hidden layer and three-hidden layer models for the bubble departure diameter improve MSE by 33.85% and 35.27%, respectively, when compared with the best response in the one-hidden layer model. Additionally, for bubble growth time, the networks with two hidden layers and three hidden layers have the 44.51% and 45.85% reduction in error, when compared with the network with one hidden layer, respectively. For the departure frequency, the error reduction in the two-layer and three-layer networks is 46.85% and 62.32%, respectively. For bubble waiting time, the best networks in the two hidden-layer and three hidden-layer models improve MSE by 52.44% and 62.27% compared with the best 1HL model response, respectively. Also, the two algorithms of SSA and GWO are able to compete well (comparable MSE) with the PSO and ABC algorithms.
more »
« less
Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms
Matrix multiplication is one of the bottleneck computations for training the weights within deep neural networks. To speed up the training phase, we propose to use faster algorithms for matrix multiplication known as Arbitrary Precision Approximating (APA) algorithms. APA algorithms perform asymptotically fewer arithmetic operations than the classical algorithm, but they compute an approximate result with an error that can be made arbitrarily small in exact arithmetic. Practical APA algorithms provide significant reduction in computation time and still provide enough accuracy for many applications like neural network training. We demonstrate that APA algorithms can be efficiently implemented and parallelized for multicore CPUs to obtain up to 28% and 21% speedups over the fastest implementation of the classical algorithm using one core and 12 cores, respectively. Furthermore, using these algorithms to train a Multi-Layer Perceptron (MLP) network yields no significant reduction in the training or testing error. Our performance results on a large MLP network show overall sequential and multithreaded performance improvements of up to 25% and 13%, respectively. We also demonstrate up to 15% improvement when training the fully connected layers of the VGG-19 image classification network.
more »
« less
- Award ID(s):
- 1942892
- PAR ID:
- 10318138
- Date Published:
- Journal Name:
- 2021 International Workshop on Embedded Multicore Systems
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Significance: The performance of traditional approaches to decoding movement intent from electromyograms (EMGs) and other biological signals commonly degrade over time. Furthermore, conventional algorithms for training neural network-based decoders may not perform well outside the domain of the state transitions observed during training. The work presented in this paper mitigates both these problems, resulting in an approach that has the potential to substantially he quality of live of people with limb loss. Objective: This paper presents and evaluates the performance of four decoding methods for volitional movement intent from intramuscular EMG signals. Methods: The decoders are trained using dataset aggregation (DAgger) algorithm, in which the training data set is augmented during each training iteration based on the decoded estimates from previous iterations. Four competing decoding methods: polynomial Kalman filters (KFs), multilayer perceptron (MLP) networks, convolution neural networks (CNN), and Long-Short Term Memory (LSTM) networks, were developed. The performance of the four decoding methods was evaluated using EMG data sets recorded from two human volunteers with transradial amputation. Short-term analyses, in which the training and cross-validation data came from the same data set, and long-term analyses training and testing were done in different data sets, were performed. Results: Short-term analyses of the decoders demonstrated that CNN and MLP decoders performed significantly better than KF and LSTM decoders, showing an improvement of up to 60% in the normalized mean-square decoding error in cross-validation tests. Long-term analysis indicated that the CNN, MLP and LSTM decoders performed significantly better than KF-based decoder at most analyzed cases of temporal separations (0 to 150 days) between the acquisition of the training and testing data sets. Conclusion: The short-term and long-term performance of MLP and CNN-based decoders trained with DAgger, demonstrated their potential to provide more accurate and naturalistic control of prosthetic hands than alternate approaches.more » « less
-
The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportunities of accelerating database systems using NVIDIA’s Tensor Core Units (TCUs). We present TCUDB, a TCU-accelerated query engine processing a set of query operators including natural joins and group-by aggregates as matrix operators within TCUs. Matrix multiplication was considered inefficient in the past; however, this strategy has remained largely unexplored in conventional GPU-based databases, which primarily rely on vector or scalar processing. We demonstrate the significant performance gain of TCUDB in a range of real-world applications including entity matching, graph query processing, and matrix-based data analytics. TCUDB achieves up to 288× speedup compared to a baseline GPU-based query engine.more » « less
-
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.more » « less
-
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TSSpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.more » « less
An official website of the United States government

