skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Regularized Estimation of High-dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models
A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables X and latent factors F, and a calibration equation that relates another set of observed variables Y with F and X. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators.  more » « less
Award ID(s):
1821220
PAR ID:
10178862
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
21
Issue:
117
ISSN:
1532-4435
Page Range / eLocation ID:
1-51
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables X and latent factors F, and a calibration equation that relates another set of observed variables Y with F and X. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators. 
    more » « less
  2. Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
    High-dimensional neural recordings across multiple brain regions can be used to establish functional connectivity with good spatial and temporal resolution. We designed and implemented a novel method, Latent Dynamic Factor Analysis of High-dimensional time series (LDFA-H), which combines (a) a new approach to estimating the covariance structure among high-dimensional time series (for the observed variables) and (b) a new extension of probabilistic CCA to dynamic time series (for the latent variables). Our interest is in the cross-correlations among the latent variables which, in neural recordings, may capture the flow of information from one brain region to another. Simulations show that LDFA-H outperforms existing methods in the sense that it captures target factors even when within-region correlation due to noise dominates cross-region correlation. We applied our method to local field potential (LFP) recordings from 192 electrodes in Prefrontal Cortex (PFC) and visual area V4 during a memory-guided saccade task. The results capture time-varying lead-lag dependencies between PFC and V4, and display the associated spatial distribution of the signals. 
    more » « less
  3. Abstract. The West Antarctic Peninsula (WAP) is a rapidly warming region, withsubstantial ecological and biogeochemical responses to the observed changeand variability for the past decades, revealed by multi-decadal observationsfrom the Palmer Antarctica Long-Term Ecological Research (LTER) program. Thewealth of these long-term observations provides an important resource forecosystem modeling, but there has been a lack of focus on the developmentof numerical models that simulate time-evolving plankton dynamics over theaustral growth season along the coastal WAP. Here, we introduce aone-dimensional variational data assimilation planktonic ecosystem model (i.e., theWAP-1D-VAR v1.0 model) equipped with a modelparameter optimization scheme. We first demonstrate the modified and newlyadded model schemes to the pre-existing food web and biogeochemicalcomponents of the other ecosystem models that WAP-1D-VAR model was adaptedfrom, including diagnostic sea-ice forcing and trophic interactions specificto the WAP region. We then present the results from model experiments wherewe assimilate 11 different data types from an example Palmer LTER growthseason (October 2002–March 2003) directly related to corresponding modelstate variables and flows between these variables. The iterative dataassimilation procedure reduces the misfits between observationsand model results by 58 %, compared to before optimization, via an optimized set of12 parameters out of a total of 72 free parameters. The optimized model resultscapture key WAP ecological features, such as blooms during seasonal sea-iceretreat, the lack of macronutrient limitation, and modeled variables andflows comparable to other studies in the WAP region, as well as severalimportant ecosystem metrics. One exception is that the model slightlyunderestimates particle export flux, for which we discuss potentialunderlying reasons. The data assimilation scheme of the WAP-1D-VAR modelenables the available observational data to constrain previously poorlyunderstood processes, including the partitioning of primary production bydifferent phytoplankton groups, the optimal chlorophyll-to-carbon ratio ofthe WAP phytoplankton community, and the partitioning of dissolved organiccarbon pools with different lability. The WAP-1D-VAR model can besuccessfully employed to link the snapshots collected by the available datasets together to explain and understand the observed dynamics along thecoastal WAP. 
    more » « less
  4. Abstract. Spatially distributed hydrological models are commonly employed to optimize the locations of engineering control measures across a watershed. Yet, parameter screening exercises that aim to reduce the dimensionality of the calibration search space are typically completed only for gauged locations, like the watershed outlet, and use screening metrics that are relevant to calibration instead of explicitly describing the engineering decision objectives. Identifying parameters that describe physical processes in ungauged locations that affect decision objectives should lead to a better understanding of control measure effectiveness. This paper provides guidance on evaluating model parameter uncertainty at the spatial scales and flow magnitudes of interest for such decision-making problems. We use global sensitivity analysis to screen parameters for model calibration, and to subsequently evaluate the appropriateness of using multipliers to adjust the values of spatially distributed parameters to further reduce dimensionality. We evaluate six sensitivity metrics, four of which align with decision objectives and two of which consider model residual error that would be considered in spatial optimizations of engineering designs. We compare the resulting parameter selection for the basin outlet and each hillslope. We also compare basin outlet results for four calibration-relevant metrics. These methods were applied to a RHESSys ecohydrological model of an exurban forested watershed near Baltimore, MD, USA. Results show that (1) the set of parameters selected by calibration-relevant metrics does not include parameters that control decision-relevant high and low streamflows, (2) evaluating sensitivity metrics at the basin outlet misses many parameters that control streamflows in hillslopes, and (3) for some multipliers, calibrating all parameters in the set being adjusted may be preferable to using the multiplier if parameter sensitivities are significantly different, while for others, calibrating a subset of the parameters may be preferable if they are not all influential. Thus, we recommend that parameter screening exercises use decision-relevant metrics that are evaluated at the spatial scales appropriate to decision making. While including more parameters in calibration will exacerbate equifinality, the resulting parametric uncertainty should be important to consider in discovering control measures that are robust to it. 
    more » « less
  5. Abstract We describe a stochastic, dynamical system capable of inference and learning in a probabilistic latent variable model. The most challenging problem in such models—sampling the posterior distribution over latent variables—is proposed to be solved by harnessing natural sources of stochasticity inherent in electronic and neural systems. We demonstrate this idea for a sparse coding model by deriving a continuous-time equation for inferring its latent variables via Langevin dynamics. The model parameters are learned by simultaneously evolving according to another continuous-time equation, thus bypassing the need for digital accumulators or a global clock. Moreover, we show that Langevin dynamics lead to an efficient procedure for sampling from the posterior distribution in the L0 sparse regime, where latent variables are encouraged to be set to zero as opposed to having a small L1 norm. This allows the model to properly incorporate the notion of sparsity rather than having to resort to a relaxed version of sparsity to make optimization tractable. Simulations of the proposed dynamical system on both synthetic and natural image data sets demonstrate that the model is capable of probabilistically correct inference, enabling learning of the dictionary as well as parameters of the prior. 
    more » « less