- Award ID(s):
- NSF-PAR ID:
- Date Published:
- Journal Name:
- 2019 IEEE International Symposium on Information Theory (ISIT)
- Page Range / eLocation ID:
- 161 to 165
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
null (Ed.)Neural Normalized MinSum (N-NMS) decoding delivers better frame error rate (FER) performance on linear block codes than conventional normalized MinSum (NMS) by assigning dynamic multiplicative weights to each check-to-variable message in each iteration. Previous N-NMS efforts have primarily investigated short-length block codes (N < 1000), because the number of N-NMS parameters to be trained is proportional to the number of edges in the parity check matrix and the number of iterations, which imposes am impractical memory requirement when Pytorch or Tensorflow is used for training. This paper provides efficient approaches to training parameters of N-NMS that support N-NMS for longer block lengths. Specifically, this paper introduces a family of neural 2-dimensional normalized (N-2D-NMS) decoders with with various reduced parameter sets and shows how performance varies with the parameter set selected. The N-2D-NMS decoders share weights with respect to check node and/or variable node degree. Simulation results justify this approach, showing that the trained weights of N-NMS have a strong correlation to the check node degree, variable node degree, and iteration number. Further simulation results on the (3096,1032) protograph-based raptor-like (PBRL) code show that N-2D-NMS decoder can achieve the same FER as N-NMS with significantly fewer parameters required. The N-2D-NMS decoder for a (16200,7200) DVBS-2 standard LDPC code shows a lower error floor than belief propagation. Finally, a hybrid decoding structure combining a feedforward structure with a recurrent structure is proposed in this paper. The hybrid structure shows similar decoding performance to full feedforward structure, but requires significantly fewer parameters.more » « less
null (Ed.)Training competitive deep video models is an order of magnitude slower than training their counterpart image models. Slow training causes long research cycles, which hinders progress in video understanding research. Following standard practice for training image models, video model training has used a fixed mini-batch shape: a specific number of clips, frames, and spatial size. However, what is the optimal shape? High resolution models perform well, but train slowly. Low resolution models train faster, but are less accurate. Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule. The different shapes arise from resampling the training data on multiple sampling grids. Training is accelerated by scaling up the mini-batch size and learning rate when shrinking the other dimensions. We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU). As an illustrative example, the proposed multigrid method trains a ResNet-50 SlowFast network 4.5 x faster (wall-clock time, same hardware) while also improving accuracy (+ 0.8% absolute) on Kinetics-400 compared to baseline training. Code is available online.more » « less
Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.more » « less
To improve the performance of neural networks for parameter estimation in quantitative MRI, in particular when the noise propagation varies throughout the space of biophysical parameters.
Theory and Methods
A theoretically well‐founded loss function is proposed that normalizes the squared error of each estimate with respective Cramér–Rao bound (CRB)—a theoretical lower bound for the variance of an unbiased estimator. This avoids a dominance of hard‐to‐estimate parameters and areas in parameter space, which are often of little interest. The normalization with corresponding CRB balances the large errors of fundamentally more noisy estimates and the small errors of fundamentally less noisy estimates, allowing the network to better learn to estimate the latter. Further, proposed loss function provides an absolute evaluation metric for performance: A network has an average loss of 1 if it is a maximally efficient unbiased estimator, which can be considered the ideal performance. The performance gain with proposed loss function is demonstrated at the example of an eight‐parameter magnetization transfer model that is fitted to phantom and in vivo data.
Networks trained with proposed loss function perform close to optimal, that is, their loss converges to approximately 1, and their performance is superior to networks trained with the standard mean‐squared error (MSE). The proposed loss function reduces the bias of the estimates compared to the MSE loss, and improves the match of the noise variance to the CRB. This performance gain translates to in vivo maps that align better with the literature.
Normalizing the squared error with the CRB during the training of neural networks improves their performance in estimating biophysical parameters.
Estimating the quality of transmission (QoT) of a lightpath before its establishment is a critical procedure for efficient design and management of optical networks. Recently, supervised machine learning (ML) techniques for QoT estimation have been proposed as an effective alternative to well-established, yet approximated, analytic models that often require the introduction of conservative margins to compensate for model inaccuracies and uncertainties. Unfortunately, to ensure high estimation accuracy, the training set (i.e., the set of historical field data, or “samples,” required to train these supervised ML algorithms) must be very large, while in real network deployments, the number of monitored/monitorable lightpaths is limited by several practical considerations. This is especially true for lightpaths with an above-threshold bit error rate (BER) (i.e., malfunctioning or wrongly dimensioned lightpaths), which are infrequently observed during network operation. Samples with above-threshold BERs can be acquired by deploying probe lightpaths, but at the cost of increased operational expenditures and wastage of spectral resources. In this paper, we propose to use
active learningto reduce the number of probes needed for ML-based QoT estimation. We build an estimation model based on Gaussian processes, which allows iterative identification of those QoT instances that minimize estimation uncertainty. Numerical results using synthetically generated datasets show that, by using the proposed active learning approach, we can achieve the same performance of standard offline supervised ML methods, but with a remarkable reduction (at least 5% and up to 75%) in the number of training samples.