Abstract Modern machine learning (ML) and deep learning (DL) techniques using high-dimensional data representations have helped accelerate the materials discovery process by efficiently detecting hidden patterns in existing datasets and linking input representations to output properties for a better understanding of the scientific phenomenon. While a deep neural network comprised of fully connected layers has been widely used for materials property prediction, simply creating a deeper model with a large number of layers often faces with vanishing gradient problem, causing a degradation in the performance, thereby limiting usage. In this paper, we study and propose architectural principles to address the question of improving the performance of model training and inference under fixed parametric constraints. Here, we present a general deep-learning framework based on branched residual learning (BRNet) with fully connected layers that can work with any numerical vector-based representation as input to build accurate models to predict materials properties. We perform model training for materials properties using numerical vectors representing different composition-based attributes of the respective materials and compare the performance of the proposed models against traditional ML and existing DL architectures. We find that the proposed models are significantly more accurate than the ML/DL models for all data sizes by using different composition-based attributes as input. Further, branched learning requires fewer parameters and results in faster model training due to better convergence during the training phase than existing neural networks, thereby efficiently building accurate models for predicting materials properties.
more »
« less
Stable tensor neural networks for efficient deep learning
Learning from complex, multidimensional data has become central to computational mathematics, and among the most successful high-dimensional function approximators are deep neural networks (DNNs). Training DNNs is posed as an optimization problem to learn network weights or parameters that well-approximate a mapping from input to target data. Multiway data or tensors arise naturally in myriad ways in deep learning, in particular as input data and as high-dimensional weights and features extracted by the network, with the latter often being a bottleneck in terms of speed and memory. In this work, we leverage tensor representations and processing to efficiently parameterize DNNs when learning from high-dimensional data. We propose tensor neural networks (t-NNs), a natural extension of traditional fully-connected networks, that can be trained efficiently in a reduced, yet more powerful parameter space. Our t-NNs are built upon matrix-mimetic tensor-tensor products, which retain algebraic properties of matrix multiplication while capturing high-dimensional correlations. Mimeticity enables t-NNs to inherit desirable properties of modern DNN architectures. We exemplify this by extending recent work on stable neural networks, which interpret DNNs as discretizations of differential equations, to our multidimensional framework. We provide empirical evidence of the parametric advantages of t-NNs on dimensionality reduction using autoencoders and classification using fully-connected and stable variants on benchmark imaging datasets MNIST and CIFAR-10.
more »
« less
- Award ID(s):
- 2309751
- PAR ID:
- 10533088
- Editor(s):
- Zhang, Yanqing
- Publisher / Repository:
- Frontiers
- Date Published:
- Journal Name:
- Frontiers in Big Data
- Volume:
- 7
- ISSN:
- 2624-909X
- Subject(s) / Keyword(s):
- tensor algebra, deep learning, machine learning, image classification, inverse problems
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a unified optimization framework that combines neural networks with dictionary learning to model complex interactions between resting state functional MRI and behavioral data. The dictionary learning objective decomposes patient correlation matrices into a collection of shared basis networks and subject-specific loadings. These subject-specific features are simultaneously input into a neural network that predicts multidimensional clinical information. Our novel optimization framework combines the gradient information from the neural network with that of a conventional matrix factorization objective. This procedure collectively estimates the basis networks, subject loadings, and neural network weights most informative of clinical severity. We evaluate our combined model on a multi-score prediction task using 52 patients diagnosed with Autism Spectrum Disorder (ASD). Our integrated framework outperforms state-of-the-art methods in a ten-fold cross validated setting to predict three different measures of clinical severity.more » « less
-
We propose a unified optimization framework that combines neural networks with dictionary learning to model complex interactions between resting state functional MRI and behavioral data. The dictionary learning objective decomposes patient correlation matrices into a collection of shared basis networks and subject-specific loadings. These subject-specific features are simultaneously input into a neural network that predicts multidimensional clinical information. Our novel optimization framework combines the gradient information from the neural network with that of a conventional matrix factorization objective. This procedure collectively estimates the basis networks, subject loadings, and neural network weights most informative of clinical severity. We evaluate our combined model on a multi-score prediction task using 52 patients diagnosed with Autism Spectrum Disorder (ASD). Our integrated framework outperforms state-of-the-art methods in a ten-fold cross validated setting to predict three different measures of clinical severity.more » « less
-
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.more » « less
-
null (Ed.)Security of modern Deep Neural Networks (DNNs) is under severe scrutiny as the deployment of these models become widespread in many intelligence-based applications. Most recently, DNNs are attacked through Trojan which can effectively infect the model during the training phase and get activated only through specific input patterns (i.e, trigger) during inference. In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through bit-flip attack. Our algorithm efficiently generates a trigger specifically designed to locate certain vulnerable bits of DNN weights stored in main memory (i.e., DRAM). The objective is that once the attacker flips these vulnerable bits, the network still operates with normal inference accuracy with benign input. However, when the attacker activates the trigger by embedding it with any input, the network is forced to classify all inputs to a certain target class. We demonstrate that flipping only several vulnerable bits identified by our method, using available bit-flip techniques (i.e, row-hammer), can transform a fully functional DNN model into a Trojan-infected model. We perform extensive experiments of CIFAR-10, SVHN and ImageNet datasets on both VGG-16 and Resnet-18 architectures. Our proposed TBT could classify 92 of test images to a target class with as little as 84 bit-flips out of 88 million weight bits on Resnet-18 for CIFAR10 dataset.more » « less
An official website of the United States government

