skip to main content

Title: Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation
Three features are crucial for sequential forecasting and generation models: tractability, expressiveness, and theoretical backing. While neural autoregressive models are relatively tractable and offer powerful predictive and generative capabilities, they often have complex optimization landscapes, and their theoretical properties are not well understood. To address these issues, we present convex formulations of autoregressive models with one hidden layer. Specifically, we prove an exact equivalence between these models and constrained, regularized logistic regression by using semi-infinite duality to embed the data matrix onto a higher dimensional space and introducing inequality constraints. To make this formulation tractable, we approximate the constraints using a hinge loss or drop them altogether. Furthermore, we demonstrate faster training and competitive performance of these implementations compared to their neural network counterparts on a variety of data sets. Consequently, we introduce techniques to derive tractable, expressive, and theoretically-interpretable models that are nearly equivalent to neural autoregressive models.
Authors:
; ; ;
Award ID(s):
2037304
Publication Date:
NSF-PAR ID:
10290937
Journal Name:
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range or eLocation-ID:
3890 to 3894
Sponsoring Org:
National Science Foundation
More Like this
  1. The standard approach to fitting an autoregressive spike train model is to maximize the likelihood for one-step prediction. This maximum likelihood estimation (MLE) often leads to models that perform poorly when generating samples recursively for more than one time step. Moreover, the generated spike trains can fail to capture important features of the data and even show diverging firing rates. To alleviate this, we propose to directly minimize the divergence between neural recorded and model generated spike trains using spike train kernels. We develop a method that stochastically optimizes the maximum mean discrepancy induced by the kernel. Experiments performed on both real and synthetic neural data validate the proposed approach, showing that it leads to well-behaving models. Using different combinations of spike train kernels, we show that we can control the trade-off between different features which is critical for dealing with model-mismatch.
  2. Identifying the directed connectivity that underlie networked activity between different cortical areas is critical for understanding the neural mechanisms behind sensory processing. Granger causality (GC) is widely used for this purpose in functional magnetic resonance imaging analysis, but there the temporal resolution is low, making it difficult to capture the millisecond-scale interactions underlying sensory processing. Magne- toencephalography (MEG) has millisecond resolution, but only provides low-dimensional sensor-level linear mixtures of neural sources, which makes GC inference challenging. Conventional methods proceed in two stages: First, cortical sources are estimated from MEG using a source localization technique, followed by GC inference among the estimated sources. However, the spatiotemporal biases in estimating sources propagate into the subsequent GC analysis stage, may result in both false alarms and missing true GC links. Here, we introduce the Network Localized Granger Causality (NLGC) inference paradigm, which models the source dynamics as latent sparse multivariate autoregressive processes and estimates their parameters directly from the MEG measurements, integrated with source localization, and employs the resulting parameter estimates to produce a precise statistical characterization of the detected GC links. We offer several theoretical and algorithmic innovations within NLGC and further examine its utility via comprehensive simulations and application to MEGmore »data from an auditory task involving tone processing from both younger and older participants. Our simulation studies reveal that NLGC is markedly robust with respect to model mismatch, network size, and low signal-to-noise ratio, whereas the conventional two-stage methods result in high false alarms and mis-detections. We also demonstrate the advantages of NLGC in revealing the cortical network- level characterization of neural activity during tone processing and resting state by delineating task- and age-related connectivity changes.« less
  3. Pierre Alquier (Ed.)
    A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies.
  4. This work proposes an Adaptive Fuzzy Prediction (AFP) method for the attenuation time series in Commercial Microwave links (CMLs). Time-series forecasting models regularly rely on the assumption that the entire data set follows the same Data Generating Process (DGP). However, the signals in wireless microwave links are severely affected by the varying weather conditions in the channel. Consequently, the attenuation time series might change its characteristics significantly at different periods. We suggest an adaptive framework to better employ the training data by grouping sequences with related temporal patterns to consider the non-stationary nature of the signals. The focus in this work is two-folded. The first is to explore the integration of static data of the CMLs as exogenous variables for the attenuation time series models to adopt diverse link characteristics. This extension allows to include various attenuation datasets obtained from additional CMLs in the training process and dramatically increasing available training data. The second is to develop an adaptive framework for short-term attenuation forecasting by employing an unsupervised fuzzy clustering procedure and supervised learning models. We empirically analyzed our framework for model and data-driven approaches with Recurrent Neural Network (RNN) and Autoregressive Integrated Moving Average (ARIMA) variations. We evaluate themore »proposed extensions on real-world measurements collected from 4G backhaul networks, considering dataset availability and the accuracy for 60 seconds prediction. We show that our framework can significantly improve conventional models’ accuracy and that incorporating data from various CMLs is essential to the AFP framework. The proposed methods have been shown to enhance the forecasting model’s performance by 30 − 40%, depending on the specific model and the data availability.« less
  5. This work proposes an Adaptive Fuzzy Prediction (AFP) method for the attenuation time series in Commercial Microwave links (CMLs). Time-series forecasting models regularly rely on the assumption that the entire data set follows the same Data Generating Process (DGP). However, the signals in wireless microwave links are severely affected by the varying weather conditions in the channel. Consequently, the attenuation time series might change its characteristics significantly at different periods. We suggest an adaptive framework to better employ the training data by grouping sequences with related temporal patterns to consider the non-stationary nature of the signals. The focus in this work is two-folded. The first is to explore the integration of static data of the CMLs as exogenous variables for the attenuation time series models to adopt diverse link characteristics. This extension allows to include various attenuation datasets obtained from additional CMLs in the training process and dramatically increasing available training data. The second is to develop an adaptive framework for short-term attenuation forecasting by employing an unsupervised fuzzy clustering procedure and supervised learning models. We empirically analyzed our framework for model and data-driven approaches with Recurrent Neural Network (RNN) and Autoregressive Integrated Moving Average (ARIMA) variations. We evaluate themore »proposed extensions on real-world measurements collected from 4G backhaul networks, considering dataset availability and the accuracy for 60 seconds prediction. We show that our framework can significantly improve conventional models’ accuracy and that incorporating data from various CMLs is essential to the AFP framework. The proposed methods have been shown to enhance the forecasting model’s performance by 30 − 40%, depending on the specific model and the data availability.« less