NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Global convergence and geometric characterization of slow to fast weight evolution in neural network training for classifying linearly non-separable data

https://doi.org/10.3934/ipi.2020077

Long, Ziang; Yin, Penghang; Xin, Jack (January 2021, Inverse Problems & Imaging)
null (Ed.)
Full Text Available
Structured Sparsity of Convolutional Neural Networks via Nonconvex Sparse Group Regularization

Kevin Bui, Fredrick Park (January 2021, Frontiers in applied mathematics and statistics)

Convolutional neural networks (CNN) have been hugely successful recently with superior accuracy and performance in various imaging applications, such as classification, object detection, and segmentation. However, a highly accurate CNN model requires millions of parameters to be trained and utilized. Even to increase its performance slightly would require significantly more parameters due to adding more layers and/or increasing the number of filters per layer. Apparently, many of these weight parameters turn out to be redundant and extraneous, so the original, dense model can be replaced by its compressed version attained by imposing inter- and intra-group sparsity onto the layer weights during training. In this paper, we propose a nonconvex family of sparse group lasso that blends nonconvex regularization (e.g., transformed L1, L1 - L2, and L0) that induces sparsity onto the individual weights and L2,1 regularization onto the output channels of a layer. We apply variable splitting onto the proposed regularization to develop an algorithm that consists of two steps per iteration: gradient descent and thresholding. Numerical experiments are demonstrated on various CNN architectures showcasing the effectiveness of the nonconvex family of sparse group lasso in network sparsification and test accuracy on par with the current state of the art.
more » « less
Full Text Available
Enhanced diffusivity in perturbed senile reinforced random walk models

https://doi.org/DOI: 10.3233/ASY-201611

Dinh, Thu; Xin, Jack (October 2020, Asymptotic analysis)

We consider diffusivity of random walks with transition probabilities depending on the number of consecutive traversals of the last traversed edge, the so called senile reinforced random walk (SeRW). In one dimension, the walk is known to be sub-diffusive with identity reinforcement function. We perturb the model by introducing a small probability δ of escaping the last traversed edge at each step. The perturbed SeRW model is diffusive for any δ > 0 , with enhanced diffusivity (≫ O ( δ^2 ) ) in the small δ regime. We further study stochastically perturbed SeRW models by having the last edge escape probability of the form δ ξ n with ξ n ’s being independent random variables. Enhanced diffusivity in such models are logarithmically close to the so called residual diffusivity (positive in the zero δ limit), with diffusivity between O ( 1/| log δ | ) and O ( 1/ log | log δ | ) . Finally, we generalize our results to higher dimensions where the unperturbed model is already diffusive. The enhanced diffusivity can be as much as O ( 1/log ^2 δ )
more » « less
Full Text Available
Nonconvex Regularization for Network Slimming: Compressing CNNs Even More

Kevin Bui, Fredrick Park (October 2020, The 15th International Symposium on Visual Computing)

In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an L1 penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the L1 penalty with the Lp and transformed L1 (TL1) penalties since these nonconvex penalties outperformed L1 in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with Lp and TL1 penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than L1. In addition, TL1 preserves the model accuracy after channel pruning, L1/2 and L3/4 yield compressed models with similar accuracies as L1 after retraining.
more » « less
Full Text Available
A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19

https://doi.org/https://doi.org/10.1101/2020.07.20.20158568.

Li, Zhijian; Zheng, Yunling; Xin, Jack; Zhou, Guofa (October 2020, of the 12th International Conference on Knowledge Discovery and Information Retrieval)

The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and realtime forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data.
more » « less
Full Text Available
Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via L1, L0, and transformed-L1 Penalties

Dinh, Thu; Xin, Jack (September 2020, Intelligent Systems Conference (IntelliSys))

Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under L1, L0, and transformed-L1 penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical findings, and compare the accuracy and sparsity trade-off among the penalties.
more » « less
Full Text Available
Two-Grid Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations

https://doi.org/https://doi.org/10.1007/s10915-020-01288-9

Dai, Xiaoying; Kuang, Xiong; Xin, Jack; Zhou, Aihui (August 2020, Journal of scientific computing)

In this article, we propose a two-grid based adaptive proper orthogonal decomposition (POD) method to solve the time dependent partial differential equations. Based on the error obtained in the coarse grid, we propose an error indicator for the numerical solution obtained in the fine grid. Our new method is cheap and easy to be implement. We apply our new method to the solution of time-dependent advection–diffusion equations with the Kolmogorov flow and the ABC flow. The numerical results show that our method is more efficient than the existing POD methods.
more » « less
Full Text Available
AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks

https://doi.org/https://doi.org/10.1145/3394486.3403103

Lyu, Jiancheng; Zhang, Shuai; Qi, Yingyong; Xin, Jack (August 2020, 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

ShuffleNet is a state-of-the-art light weight convolutional neural network architecture. Its basic operations include group, channelwise convolution and channel shuffling. However, channel shuffling is manually designed on empirical grounds. Mathematically, shuffling is a multiplication by a permutation matrix. In this paper, we propose to automate channel shuffling by learning permutation matrices in network training. We introduce an exact Lipschitz continuous non-convex penalty so that it can be incorporated in the stochastic gradient descent to approximate permutation at high precision. Exact permutations are obtained by simple rounding at the end of training and are used in inference. The resulting network, referred to as AutoShuffleNet, achieved improved classification accuracies on data from CIFAR-10, CIFAR-100 and ImageNet while preserving the inference costs of ShuffleNet. In addition, we found experimentally that the standard convex relaxation of permutation matrices into stochastic matrices leads to poor performance. We prove theoretically the exactness (error bounds) in recovering permutation matrices when our penalty function is zero (very small). We present examples of permutation optimization through graph matching and two-layer neural network models where the loss functions are calculated in closed analytical form. In the examples, convex relaxation failed to capture permutations whereas our penalty succeeded.
more » « less
Full Text Available
A Recurrent Neural Network and Differential Equation based Spatiotemporal Infectious Disease Model with Application to COVID-19

https://doi.org/https://DOI:10.5220/0010130000930103

Li, Zhijian; Zheng, Yunling; Xin, Jack; Zhou, Guofa (July 2020, 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management)

The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data.
more » « less
Full Text Available
Convergence of a Relaxed Variable Splitting Coarse Gradient Descent Method for Learning Sparse Weight Binarized Activation Neural Network

https://doi.org/10.3389/fams.2020.00013

Dinh, Thu; Xin, Jack (May 2020, Frontiers in Applied Mathematics and Statistics)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records