Machine learning influences numerous aspects of modern society, empowers new technologies, from Alphago to ChatGPT, and increasingly materializes in consumer products such as smartphones and self-driving cars. Despite the vital role and broad applications of artificial neural networks, we lack systematic approaches, such as network science, to understand their underlying mechanism. The difficulty is rooted in many possible model configurations, each with different hyper-parameters and weighted architectures determined by noisy data. We bridge the gap by developing a mathematical framework that maps the neural network’s performance to the network characters of the line graph governed by the edge dynamics of stochastic gradient descent differential equations. This framework enables us to derive a neural capacitance metric to universally capture a model’s generalization capability on a downstream task and predict model performance using only early training results. The numerical results on 17 pre-trained ImageNet models across five benchmark datasets and one NAS benchmark indicate that our neural capacitance metric is a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
more » « less- NSF-PAR ID:
- 10521889
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 15
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
This paper explores three different model components to improve predictive performance over the ViEWS benchmark: a class of neural networks that account for spatial and temporal dependencies; the use of CAMEO-coded event data; and the continuous rank probability score (CRPS), which is a proper scoring metric. We forecast changes in state based violence across Africa at the grid-month level. The results show that spatio-temporal graph convolutional neural network models offer consistent improvements over the benchmark. The CAMEO-coded event data sometimes improve performance, but sometimes decrease performance. Finally, the choice of performance metric, whether it be the mean squared error or a proper metric such as the CRPS, has an impact on model selection. Each of these components–algorithms, measures, and metrics–can improve our forecasts and understanding of violence.more » « less
-
Abstract Motivation State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type.
Results We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and word-level information among relevant biomedical entities across differently labeled corpora.
Availability and implementation Our source code is available at https://github.com/yuzhimanhua/lm-lstm-crf.
Supplementary information Supplementary data are available at Bioinformatics online.
-
Ayahiko Niimi, Future University-Hakodate (Ed.)Traditional Network Intrusion Detection Systems (NIDS) encounter difficulties due to the exponential growth of network traffic data and modern attacks' requirements. This paper presents a novel network intrusion classification framework using transfer learning from the VGG-16 pre-trained model. The framework extracts feature leveraging pre-trained weights trained on the ImageNet dataset in the initial step, and finally, applies a deep neural network to the extracted features for intrusion classification. We applied the presented framework on NSL-KDD, a benchmark dataset for network intrusion, to evaluate the proposed framework's performance. We also implemented other pre-trained models such as VGG19, MobileNet, ResNet-50, and Inception V3 to evaluate and compare performance. This paper also displays both binary classification (normal vs. attack) and multi-class classification (classifying types of attacks) for network intrusion detection. The experimental results show that feature extraction using VGG-16 outperforms other pre-trained models producing better accuracy, precision, recall, and false alarm rates.more » « less
-
State-of-the-art subspace clustering methods are based on the self-expressive model, which represents each data point as a linear combination of other data points. However, such methods are designed for a finite sample dataset and lack the ability to generalize to out-of-sample data. Moreover, since the number of self-expressive coefficients grows quadratically with the number of data points, their ability to handle large-scale datasets is often limited. In this paper, we propose a novel framework for subspace clustering, termed Self-Expressive Network (SENet), which employs a properly designed neural network to learn a self-expressive representation of the data. We show that our SENet can not only learn the self-expressive coefficients with desired properties on the training data, but also handle out-of-sample data. Besides, we show that SENet can also be leveraged to perform subspace clustering on large-scale datasets. Extensive experiments conducted on synthetic data and real world benchmark data validate the effectiveness of the proposed method. In particular, SENet yields highly competitive performance on MNIST, Fashion MNIST and Extended MNIST and state-of-the-art performance on CIFAR-10.more » « less
-
Abstract Generative adversarial networks (GANs) have been impactful on many problems and applications but suffer from unstable training. The Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. We introduce a novel inferential Wasserstein GAN (iWGAN) model, which is a principled framework to fuse autoencoders and WGANs. The iWGAN model jointly learns an encoder network and a generator network motivated by the iterative primal-dual optimization process. The encoder network maps the observed samples to the latent space and the generator network maps the samples from the latent space to the data space. We establish the generalization error bound of the iWGAN to theoretically justify its performance. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that the iWGAN greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of the iWGAN by obtaining competitive and stable performances for benchmark datasets.