skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Gradient-Enhanced Multifidelity Neural Networks for High-Dimensional Function Approximation
Abstract In this work, a novel multifidelity machine learning (ML) algorithm, the gradient-enhanced multifidelity neural networks (GEMFNN) algorithm, is proposed. This is a multifidelity extension of the gradient-enhanced neural networks (GENN) algorithm as it uses both function and gradient information available at multiple levels of fidelity to make function approximations. Its construction is similar to the multifidelity neural networks (MFNN) algorithm. The proposed algorithm is tested on three analytical functions, a one, two, and a 20 variable function. Its performance is compared to the performance of neural networks (NN), GENN, and MFNN, in terms of the number of samples required to reach a global accuracy of 0.99 of the coefficient of determination (R2). The results showed that GEMFNN required 18, 120, and 600 high-fidelity samples for the one, two, and 20 dimensional cases, respectively, to meet the target accuracy. NN performed best on the one variable case, requiring only ten samples, while GENN worked best on the two variable case, requiring 120 samples. GEMFNN worked best for the 20 variable case, while requiring nearly eight times fewer samples than its nearest competitor, GENN. For this case, NN and MFNN did not reach the target global accuracy even after using 10,000 high-fidelity samples. This work demonstrates the benefits of using gradient as well as multifidelity information in NN for high-dimensional problems.  more » « less
Award ID(s):
1846862
PAR ID:
10343893
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The objective of this work is to reduce the cost of performing model-based sensitivity analysis for ultrasonic nondestructive testing systems by replacing the accurate physics-based model with machine learning (ML) algorithms and quickly compute Sobol’ indices. The ML algorithms considered in this work are neural networks (NNs), convolutional NN (CNNs), and deep Gaussian processes (DGPs). The performance of these algorithms is measured by the root mean-squared error on a fixed number of testing points and by the number of high-fidelity samples required to reach a target accuracy. The algorithms are compared on three ultrasonic testing benchmark cases with three uncertainty parameters, namely, spherically void defect under a focused and a planar transducer and spherical-inclusion defect under a focused transducer. The results show that NNs required 35, 100, and 35 samples for the three cases, respectively. CNNs required 35, 100, and 56, respectively, while DGPs required 84, 84, and 56, respectively. 
    more » « less
  2. We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form where is a degree polynomial and is a degree polynomial. This function class generalizes the single-index model, which corresponds to , and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree polynomials , a three-layer neural network trained via layerwise gradient descent on the square loss learns the target up to vanishing test error in samples and polynomial time. This is a strict improvement over kernel methods, which require samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of being a quadratic. When is indeed a quadratic, we achieve the information-theoretically optimal sample complexity , which is an improvement over prior work (Nichani et al., 2023) requiring a sample size of . Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature with samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions. 
    more » « less
  3. We analyze the regression accuracy of convolutional neural networks assembled from encoders, decoders and skip connections and trained with multifidelity data. These networks benefit from a significant reduction in the number of trainable parameters with respect to an equivalent fully connected network. These architectures are also versatile with respect to the input and output dimensionality. For example, encoder-decoder, decoder-encoder or decoder-encoder-decoder architectures are well suited to learn mappings between input and outputs of any dimensionality. We demonstrate the accuracy produced by such architectures when trained on a few high-fidelity and many low-fidelity data generated from models ranging from one-dimensional functions to Poisson equation solvers in two-dimensions. We finally discuss a number of implementation choices that improve the reliability of the uncertainty estimates generated by a dropblock regularizer, and compare uncertainty estimates among low-, high- and multi-fidelity approaches. 
    more » « less
  4. Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds. 
    more » « less
  5. In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset. 
    more » « less